In my example, I am returning all rows that have any one of the elements from list1
. I want to be more restrictive and make it return rows with at least two elements from list1
.
Is this possible?
import pandas as pd
data = [
['tom steve orange', 'jane'],
['dave smith green', 'fran'],
['brit dave red', 'terri']
]
cols = ['A', 'B']
df = pd.DataFrame(data, columns=cols)
list1 = ['dave', 'red', 'blue']
df = df[df['A'].str.contains('|'.join(list1))].reset_index(drop=True)
print(df)
current result:
A B
0 dave smith green fran
1 brit dave red terri
Desired result:
A B
0 brit dave red terri
CodePudding user response:
You can use set operations:
S = set(list1)
out = df[[len(set(l.split())&S)>=2 for l in df['A']]]
# or
# out = df[[len(S.intersection(l.split()))>=2 for l in df['A']]]
Output:
A B
2 brit dave red terri