Home > Blockchain >  translating Unicode characters from input
translating Unicode characters from input

Time:10-16

I have a string with unicode characters that I need to decode. When I hardcode the string into python it seems to work. However, if I get it through an input, it doesn't translate. For example,

input_0 = input() #f\u00eate
print(input_0) # prints f\u00eate
word = "f\u00eate"
print(word) # prints fête

How could I turn the Unicode parts of the string from the input into regular characters? I have tried using str(word) too.

CodePudding user response:

What you get from input() is a raw-string which means you don't have escape sequence they are literal characters. \u00ea is 6 characters.

You should encode it with "raw-unicode-escape" and then decode it with "unicode-escape":

input_0 = input()  # f\u00eate
print(input_0.encode("raw-unicode-escape").decode("unicode-escape"))

Explanation for these two encodings: https://docs.python.org/3/library/codecs.html#text-encodings

  • Related