I have a string, and I have to count all elements in this string.
str = '\r\n\r\n\r\n \r\n \xa0\xa0\r\nIntroduction\r\n\r\n\r\nHello\r\n\r\nWorld\r\nProblems...\r\nHow to calculate numbers...\r\nConclusion\r\n\r\n\r\n\xa0\r\n\r\nHello world.'
These elements contain numbers, letters, escape sequences, whitespaces, commas, etc.
Is there any way to count all elements in this kind of string in Python?
I know that len()
and count()
cannot help. And I also tried some regex methods like re.findall(r'.', str)
, but it cannot find elements like \n
and also can only find \r
instead of \
and r
.
Edit:
To be more clear, I want to count \n
as 2, not 1, and also \xa0
as 4, not 1.
CodePudding user response:
\
is a special character in Python so you have to escape them like str = '\\r\\n '
or str = r'\r\n '
. After that, len()
counts \
as an independent character.
CodePudding user response:
Python compiles your string literal into a python string where escaped character sequences such as \n
are replaced with their unicode character equivalent (in this case the unicode U-000A newline). len
would count this 2 character sequence as a single character.
By the time your code sees this string, the original python literal escape sequence is gone. But the repr
representation adds escape sequences back. So you could take the length of that.
>>> s = '\r\n\r\n\r\n \r\n \xa0\xa0\r\nIntroduction\r\n\r\n\r\nHello\r\n\r\nWorld\r\nProblems...\r\nHow to calculate numbers...\r\nConclusion\r\n\r\n\r\n\xa0\r\n\r\nHello world.'
>>> print(len(s))
123
>>> print(len(repr(s)))
170
This isn't going to be 100% accurate because there is more than one way to construct a unicode character in a literal string. For instance "\n"
and "\x0a"
both decode to the same newline character and there is no way to know which form it came from.
Alternately, you could use "raw" strings that do not escape the characters. So, r"\n"
is length 2.