I am new in programming world, and I am a bit confused.
I expecting that both print result the same graphical unicode exclamation mark symbol:
My experiment:
number = 10071
byteStr = number.to_bytes(4, byteorder='big')
hexStr = hex(number)
uniChar = byteStr.decode('utf-32be')
uniStr = '\\u' hexStr[2:6]
print(f'{number} - {hexStr[2:6]} - {byteStr} - {uniChar}')
print(f'{uniStr}') # Not working
print(f'\u2757') # Working
Output:
10071 - 2757 - b"\x00\x00'W" - ❗
\u2757
❗
What are the difference in the last two lines? Please, help me to understand it!
My environment is JupyterHub and v3.9 python.
CodePudding user response:
An escape code evaluated by the Python parser when constructing literal strings. For example, the literal string '马'
and '\u9a6c'
are evaluated by the parser as the same, length 1, string.
You can (and did) build a string with the 6 characters \u9a6c
by using an escape code for the backslash (\\
) to prevent the parser from evaluating those 6 characters as an escape code, which is why it prints as the 6-character \u2757
.
If you build a byte string with those 6 characters, you can decode it with .decode('unicode-escape')
to get the character:
>>> b'\\u2757'.decode('unicode_escape')
'❗'
But it is easier to use the chr()
function on the number itself:
>>> chr(0x2757)
'❗'
>>> chr(10071)
'❗'