I'm migrating a script to a new python env, I don't like the regex I'd use \b instead, anyway I want to change as little as possible the existing code.
I get this error executing the script:
UserWarning: This pattern is interpreted as a regular expression, and has match groups. To actually get the groups, use str.extract.
word_in_data = self.data['text'].str.contains(r"(?:^|[^a-zA-Z0-9])" word r"(?:$|[^a-zA-Z0-9])", na=False, regex=True).copy()
This is the row containing the regex:
self.data['text'].str.contains(r"(?:^|[^a-zA-Z0-9])" word r"(?:$|[^a-zA-Z0-9])", na=False, regex=True).copy()
It's using non capturing matching groups, (?:) why do I get this warning?
Thanks!
CodePudding user response:
If word
contain ()
the warning is raised. Try to escape word
# Simple word
word = 'fractured'
df['text'].str.contains(r"(?:^|[^a-zA-Z0-9])" word r"(?:$|[^a-zA-Z0-9])", na=False, regex=True)
0 True
1 False
2 False
3 True
4 False
5 True
Name: text, dtype: bool
# Simple word with parenthesis
word = '(fractured)'
df['text'].str.contains(r"(?:^|[^a-zA-Z0-9])" word r"(?:$|[^a-zA-Z0-9])", na=False, regex=True)
UserWarning: This pattern is interpreted as a regular expression, and has match groups. To actually get the groups, use str.extract.
df['text'].str.contains(r"(?:^|[^a-zA-Z0-9])" word r"(?:$|[^a-zA-Z0-9])", na=False, regex=True)
0 True
1 False
2 False
3 True
4 False
5 True
Name: text, dtype: bool
# Simple word with parenthesis but escaped
word = '(fractured)'
word = re.escape(word)
df['text'].str.contains(r"(?:^|[^a-zA-Z0-9])" word r"(?:$|[^a-zA-Z0-9])", na=False, regex=True)
0 False
1 False
2 False
3 False
4 False
5 False
Name: text, dtype: bool