How to use str.contains to identify a word within column values in Python?-CodePudding

currently in my data I have a column that contains description of transaction. I want to use str.contains to identify which values/rows are AW (the fast food store) transaction. However, when I use data['cat_desc'].str.contains('AW', case=False, na=False), it also identifies values that have string 'aw', for example 'awxxxx' but I don't want that. How can I just identify 'AW' as a word and not string? Thanks!

CodePudding user response：

Then use a regex with word boundaries ('\b'):

data['cat_desc'].str.contains(r'\bAW\b', case=False, na=False, regex=True)

NB. By default contains uses regex=True.

CodePudding user response：

import re

x = "awxxx"
y = "aw"

a = bool(re.match(r"^aw$", x))
b = bool(re.match(r"^aw$", y))

print(a, b)