I have been meaning to extract the month name from the following string with regex and despite the fact that my regex works on a platform like regex101, I can't seem to be able to extract the word "August".
import re
s = "word\anyword\2021\August\202108_filename.csv"
re.findall("\d \\([[:alpha:]] )\\\d ", s)
Which results in the following error:
error: unbalanced parenthesis at position 17
I also tried using re.compile
, re.escape
as per suggestions of the previous posts dealing with the same error but none of them seems to work.
Any help and also a little explanation on why this isn't working is greatly appreciated.
CodePudding user response:
You can use
import re
s = r"word\anyword\2021\August\202108_filename.csv"
m = re.search(r"\d \\([a-zA-Z] )\\\d ", s)
if m:
print(m.group(1))
See the Python demo.
There are three main problems here:
- The input string should be the same as used at regex101.com, i.e. you need to make sure you are using literal backslashes in the Python code, hence the use of raw string literals for both the input text and regex
- The POSIX character classes are not supported by Python
re
, so[[:alpha:]]
should be replaced with some equivalent pattern, say,[A-Za-z]
or[^\W\d_]
- Since it seems like you only expect a single match (there is only one
August
(month) name in the string), you do not needre.findall
, you can usere.search
. Only usere.findall
when you need to extract multiple matches from a string.
Also, see these posts: