Home > database >  Using \u in python f-string
Using \u in python f-string

Time:12-09

Is there way to include a \u in an f-string, postponing the evaluation of the escape sequence after the formatting?

A practical example. Let's say I have (python3)

i="0222"
print(f'\u{i}')

This is invalid and returns

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape

Is there a way to defer the evaluation of the escape sequence \u after the replacement of {i} in the string?

CodePudding user response:

Is there way to include a \u in an f-string, postponing the evaluation of the escape sequence after the formatting?

No. Escapes sequences are executed during string parsing.

Just use the chr builtin, it takes a codepoint (as an integer) and returns the corresponding string.

If for some fool reason you really absolutely want to use f-strings, then you need to create the escapes themselves in the string object:

>>> i = 0x0222
>>> s = f'\\u{i:04x}'

then apply the "unicode_escape" codec, which decodes the escapes at runtime

>>> codecs.decode(s, encoding='unicode_escape')
'Ȣ'

CodePudding user response:

The easier thing to do is this:

i=0x0222
print(f'{chr(i)}')


Replacing the data part of a "\u" prefix won't work in a fstring because the \uXXXX pattern is considered a single character: Python will try to parse the { as one of the "X" and raise a SyntaxError because it is not an hexadecimal digit.

That said, there are other ways of writing this replacement, so that you can code inline-dynamic unicode codepoints by number. One of those would be using the "unicodeescape" codec directly - but your string must be translated to bytes first:

i = "0222"
f"\\u{i}".encode("ASCII", errors="backslashreplace").decode("unicode_escape")

So -we are doing 3 things here: first, the "double \" escapes the actual slash and produces two characters in the first string¨: the "" and "u" characters - unlike "\u" that is a sequence that will direct the parser to require four hex digits following.
After that, the {i}, inside the fstring will work as expected: the 4 digits are just rendered there. The resulting string is then encoded to a bytes object, using the restrict "ASCII" codec, but told to transform any characters that can't be ascii represented as backslash sequences in the final byte string. This transformation won't affect the \u0222 sequence itself, but it will ensure any other unicode characters that may be present in the text will be preserved and allowed to roundtrip.

The call to ".decode" will take place on the bytes object, and will "manually" (i.e. at program runtime, rather than at source-code parsing time) do the "\u" substitution you were trying in the first place. This code will "see" the \u0222 sequence and yield the Ȣ character as wanted.

Since this is inconvenient to write, you can create an utility function:

def r(text, character_codes):
    return (text.format(**character_codes)
        .encode("ASCII", errors="backslashreplace")
        .decode("unicode_escape")
    )

...
i = "0222"
text = r("\\u{i}", locals())
  • Related