Home > database >  convert a string to its codepoint in python
convert a string to its codepoint in python

Time:10-09

there are characters like '‌' that are not visible so I cant copy paste it. I want to convert any character to its codepoint like '\u200D'

another example is: 'abc' => '\u0061\u0062\u0063'

CodePudding user response:

Allow me to rephrase your question. The header convert a string to its codepoint in python clearly did not get through to everyone, mostly, I think, because we can't imagine what you want it for.

What you want is a string containing a representation of Unicode escapes.

You can do that this way:

print(''.join("\\u{:04x}".format(b) for b in b'abc'))
\u0061\u0062\u0063

If you display that printed value as a string literal you will see doubled backslashes, because backslashes have to be escaped in a Python string. So it will look like this:

'\\u0061\\u0062\\u0063'

The reason for that is that if you simply put unescaped backslashes in your string literal, like this:

a = "\u0061\u0062\u0063"

when you display a at the prompt you will get:

>>> a
'abc'

CodePudding user response:

'\u0061\u0062\u0063'.encode('utf-8') will encode the text to Unicode.

Edit:

Since python automatically converts the string to Unicode you can't see the value but you can create a function that will generate that.

def get_string_unicode(string_to_convert):
    res = ''

    for letter in string_to_convert:
        res  = '\\u'   (hex(ord(letter))[2:]).zfill(4)

    return res

Result:

>>> get_string_unicode('abc') 
'\\u0061\\u0062\\u0063'
  • Related