Home > Back-end >  Regular expression error: unbalanced parenthesis at position 17
Regular expression error: unbalanced parenthesis at position 17

Time:04-06

I have been meaning to extract the month name from the following string with regex and despite the fact that my regex works on a platform like regex101, I can't seem to be able to extract the word "August".

import re
s = "word\anyword\2021\August\202108_filename.csv"
re.findall("\d \\([[:alpha:]] )\\\d ", s)

Which results in the following error:

error: unbalanced parenthesis at position 17

I also tried using re.compile, re.escape as per suggestions of the previous posts dealing with the same error but none of them seems to work.

Any help and also a little explanation on why this isn't working is greatly appreciated.

CodePudding user response:

You can use

import re
s = r"word\anyword\2021\August\202108_filename.csv"
m = re.search(r"\d \\([a-zA-Z] )\\\d ", s)
if m:
    print(m.group(1))

See the Python demo.

There are three main problems here:

  • The input string should be the same as used at regex101.com, i.e. you need to make sure you are using literal backslashes in the Python code, hence the use of raw string literals for both the input text and regex
  • The POSIX character classes are not supported by Python re, so [[:alpha:]] should be replaced with some equivalent pattern, say, [A-Za-z] or [^\W\d_]
  • Since it seems like you only expect a single match (there is only one August (month) name in the string), you do not need re.findall, you can use re.search. Only use re.findall when you need to extract multiple matches from a string.

Also, see these posts:

  • Related