How to use df.column.str.contains() to get the same result as the code below?-CodePudding

I tried to get the same result as the code below by using str.contains(), but I just couldn't get the same result.

The goal is to filter the column "question" of Dataframe "data" with the values that have both 'England' and 'King'.

def filter_data(data, words):
  filter = lambda x: all(word.lower() in x.lower() for word in words)
  return data.loc[data["question"].apply(filter)]

answer = filter_data(data, ['England', 'King'])

My code:

re_filter = data[
                (data.question.str.contains("(\w|\W)England(\w|\W)", regex= True, case= False))& 
                (data.question.str.contains("(\w|\W)King(\w|\W)", regex= True, case= False))
                ]

Was it because the wrong regex? Thanks so much for all the help!!

CodePudding user response：

This it the easiest way:

data[data.question.str.contains(r'(?=.*England)(?=.*King)')]

CodePudding user response：

You can try:

df = pd.DataFrame(data={'question':['I have both England and King', 'I have just England', 'I have just King']})
print(df[df.question.str.contains('England') & (df.question.str.contains('King'))])

Output:

                       question
0  I have both England and King