Home > Back-end >  Using f-strings with unicode escapes
Using f-strings with unicode escapes

Time:09-30

I've got strings that look something like this: a = "testing test<U 00FA>ing <U 00F3>"

Format will not always be like that, but those unicode characters in brackets will be scattered throughout the code. I want to turn those into the actual unicode characters they represent. I tried this function:

def replace_unicode(s):
    uni = re.findall(r'<U\ \w\w\w\w>', s)

    for a in uni:
        s = s.replace(a, f'\u{a[3:7]}')
    return s

This successfully finds all of the <U > unicode strings, but it won't let me put them together to create a unicode escape in this manner.

  File "D:/Programming/tests/test.py", line 8
    s = s.replace(a, f'\u{a[3:7]}')
                     ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape

How can I create a unicode escape character using an f-string, or via some other method with the information I'm getting from strings?

CodePudding user response:

chepner's answer is good, but you don't actually need an f-string. int(a[3:7], base=16) works perfectly fine.

Also, it would make a lot more sense to use re.sub() instead of re.findall() then str.replace(). I would also restrict the regex down to just hex digits and group them.

import re

def replace_unicode(s):
    pattern = re.compile(r'<U\ ([0-9A-F]{4})>')
    return pattern.sub(lambda match: chr(int(match.group(1), base=16)), s)

a = "testing test<U 00FA>ing <U 00F3>"
print(replace_unicode(a))  # -> testing testúing ó

CodePudding user response:

You can use an f-string to create an appropriate argument to int, whose result the chr function can use to produce the desired character.

for a in uni:
    s = s.replace(a, chr(int(f'0x{a[3:7]}', base=16)))
  • Related