currently in my data I have a column that contains description of transaction. I want to use str.contains to identify which values/rows are AW (the fast food store) transaction. However, when I use data['cat_desc'].str.contains('AW', case=False, na=False)
, it also identifies values that have string 'aw', for example 'awxxxx' but I don't want that. How can I just identify 'AW' as a word and not string? Thanks!
CodePudding user response:
Then use a regex with word boundaries ('\b'
):
data['cat_desc'].str.contains(r'\bAW\b', case=False, na=False, regex=True)
NB. By default contains
uses regex=True
.
CodePudding user response:
import re
x = "awxxx"
y = "aw"
a = bool(re.match(r"^aw$", x))
b = bool(re.match(r"^aw$", y))
print(a, b)