Home > Back-end >  - This pattern is interpreted as a regular expression, and has match groups - but with no capturing
- This pattern is interpreted as a regular expression, and has match groups - but with no capturing

Time:01-25

I'm migrating a script to a new python env, I don't like the regex I'd use \b instead, anyway I want to change as little as possible the existing code.

I get this error executing the script:

UserWarning: This pattern is interpreted as a regular expression, and has match groups. To actually get the groups, use str.extract.
  word_in_data = self.data['text'].str.contains(r"(?:^|[^a-zA-Z0-9])" word r"(?:$|[^a-zA-Z0-9])", na=False, regex=True).copy()

This is the row containing the regex:

self.data['text'].str.contains(r"(?:^|[^a-zA-Z0-9])" word r"(?:$|[^a-zA-Z0-9])", na=False, regex=True).copy()

It's using non capturing matching groups, (?:) why do I get this warning?

Thanks!

CodePudding user response:

If word contain () the warning is raised. Try to escape word

# Simple word
word = 'fractured'
df['text'].str.contains(r"(?:^|[^a-zA-Z0-9])" word r"(?:$|[^a-zA-Z0-9])", na=False, regex=True)

0     True
1    False
2    False
3     True
4    False
5     True
Name: text, dtype: bool
# Simple word with parenthesis
word = '(fractured)'
df['text'].str.contains(r"(?:^|[^a-zA-Z0-9])" word r"(?:$|[^a-zA-Z0-9])", na=False, regex=True)

UserWarning: This pattern is interpreted as a regular expression, and has match groups. To actually get the groups, use str.extract.
  df['text'].str.contains(r"(?:^|[^a-zA-Z0-9])" word r"(?:$|[^a-zA-Z0-9])", na=False, regex=True)

0     True
1    False
2    False
3     True
4    False
5     True
Name: text, dtype: bool
# Simple word with parenthesis but escaped
word = '(fractured)'
word = re.escape(word)
df['text'].str.contains(r"(?:^|[^a-zA-Z0-9])" word r"(?:$|[^a-zA-Z0-9])", na=False, regex=True)

0    False
1    False
2    False
3    False
4    False
5    False
Name: text, dtype: bool
  • Related