Home > Software design >  Why does the following string work without an additional escape?
Why does the following string work without an additional escape?

Time:10-01

In the following:

>>> r'\d ','\d ', '\\d '
('\\d ', '\\d ', '\\d ')

Why does the '\d ' not require a double-escape? In other words, why is it "acting like a raw-string" even without doubling-up the \ which I normally thought I had to do in a string?

And another example:

>>> r'[a-z] \1', '[a-z] \1'
('[a-z] \\1', '[a-z] \x01')

Why does the \1 get converted into a hex escape?

CodePudding user response:

String and Bytes literals has tables showing which backslash combinations are actually escape sequences that have a special meaning. Combinations outside of these tables are not escapes, are not part of the raw string rules and are treated as regular characters. "\d" is two characters as is r"\d". You'll find, for instance, that "\n" (a single newline character) will work differently than \d.

\1 is an \ooo octal escape. When printed, python shows the same character value as a hex escape. Interestingly, \8 isn't octal but instead of raising an error, python just treats it as two characters (because its not an escape).

CodePudding user response:

Because \d is not an escape code. So, however you type it, it is interpreted as a literal \ then a d. If you type \\d, then the \\ is interpreted as an escaped \, followed by a d.

The situation is different if you choose a letter part of an escape code.

r'\n ','\n ', '\\n '

('\\n ', '\n ', '\\n ')

The first one (because raw) and the last one (because \ is escaped) is a 3-letter string containing a \ a n and a . The second one is a 2 letter string, containing a '\n' (a newline) and a

The second one is even more straightforward. Nothing strange here. r'\1' is a backslash then a one. '\1' is the character whose ASCII code is 1, whose canonical representation is '\x01' '\1', '\x01' or '\001' are the same thing. Python cannot remember what specific syntax you used to type it. All it knows is it that is the character of code 1. So, it displays it in the "canonical way".

Exactly like 'A' '\x41' or '\101' are the same thing. And would all be printed with the canonical representation, which is 'A'

  • Related