I tried to get the same result as the code below by using str.contains(), but I just couldn't get the same result.
The goal is to filter the column "question" of Dataframe "data" with the values that have both 'England' and 'King'.
def filter_data(data, words):
filter = lambda x: all(word.lower() in x.lower() for word in words)
return data.loc[data["question"].apply(filter)]
answer = filter_data(data, ['England', 'King'])
My code:
re_filter = data[
(data.question.str.contains("(\w|\W)England(\w|\W)", regex= True, case= False))&
(data.question.str.contains("(\w|\W)King(\w|\W)", regex= True, case= False))
]
Was it because the wrong regex? Thanks so much for all the help!!
CodePudding user response:
This it the easiest way:
data[data.question.str.contains(r'(?=.*England)(?=.*King)')]
CodePudding user response:
You can try:
df = pd.DataFrame(data={'question':['I have both England and King', 'I have just England', 'I have just King']})
print(df[df.question.str.contains('England') & (df.question.str.contains('King'))])
Output:
question
0 I have both England and King