Home > front end >  Why the string in unicode form is not equal to its unicode code point value?
Why the string in unicode form is not equal to its unicode code point value?

Time:12-25

We can get the string 's unicode code point value:

u'你'.encode('unicode-escape')
b'\\u4f60'

Why the string in unicode form is not equal to its unicode code point value?

u'你'  ==  u'\x4f\x60'
False
u'你'  ==  u'\\u4f60'
False

CodePudding user response:

It is, but your comparison strings are not correct to compare. The first one is two separate characters of a single byte, and the second one has the backslash escaped, meaning that it is the literal 6 characters \u4f60.

u'你' == u"\u4f60"
True

The encoded byte string has the two backslashes since the encoding escapes it, making it not equivalent even if turned back into a string unless you decode it with unicode-escape as well.

Side note, the u is default in python 3.

  • Related