sys.getdefaultencoding()
>>> sys.getdefaultencoding()Play around with multibyte characters
'ascii'
>>> msg = '今天天氣真好12345'
>>> msg
'\xe4\xbb\x8a\xe5\xa4\xa9\xe5\xa4\xa9\xe6\xb0\xa3\xe7\x9c\x9f\xe5\xa5\xbd12345'
>>> msgu = u'今天天氣真好12345'
>>> msgu
u'\u4eca\u5929\u5929\u6c23\u771f\u597d12345'
>>> print msg, msgucheck their type
今天天氣真好12345 今天天氣真好12345
>>> print type(msg), type(msgu)the length of msg/msgu is interesting
<type 'str'> <type 'unicode'>
>>> print len(msg), len(msgu)msg is encoded in "utf-8", to verify it, decode it and compare with msgu, they are identical!
23 11
>>> msg.decode('utf-8')
u'\u4eca\u5929\u5929\u6c23\u771f\u597d12345'
reference:
瞭解Unicode¶
Python Tutorial 第一堂(4)Unicode 支援、基本 I/O