i am trying to select all crispy chicken sandwich in datasets, i have tried using this regex but it still got some grilled chicken sandwich. Here is the code
data_sandwich_crispy = data[data['Item'].str.contains(r'^(?=.*crispy)(?=.*sandwich)(?=.*chicken)', regex=True)]
and here is the look of datasets
any revision, or link to answer is really appreciated. i'm really sorry if there was a mistake, thanks you for all your help!
CodePudding user response:
This would be my solution. It looks for strings where the word Crispy is followed by the word Chicken that is followed by the word Sandwich. However, there can be an arbitrary number of spaces or any other characters in between.
# some data
l = ["Crispy Chicken Sandwich",
"Grilled Chicken Sandwich",
"crispy Chicken Sandwich"]
data = pd.DataFrame(l, columns=["A"])
data
# A
# 0 Crispy Chicken Sandwich
# 1 Grilled Chicken Sandwich
# 2 crispy Chicken Sandwich
# consider `case`
data[data['A'].str.contains(r'Crispy. Chicken. Sandwich', regex=True, case=False)]
# A
# 0 Crispy Chicken Sandwich
# 2 crispy Chicken Sandwich
CodePudding user response:
If you meant collecting all rows containing crispy chicken sandwhich
only, then have a look at this alternative solution below. This will return rows only when all three words (crispy, chicken and classic) are present :
data_sandwich_crispy = df[df['item'].str.contains(r'^(?=.*?\bcrispy\b)(?=.*?\bchicken\b)(?=.*?\bclassic\b).*$',regex=True)]
I created a simple dataframe as shown below:
item id
premium crispy chicken classic sandwhich 10
premium grilled chicken classic sandwhich 15
premium club chicken classic sandwhich 14
running the command given above gives the following output:
item id
premium crispy chicken classic sandwhich 10