Home > other >  Find rows in dataframe that must contain at least 2 elements from a list
Find rows in dataframe that must contain at least 2 elements from a list

Time:04-23

In my example, I am returning all rows that have any one of the elements from list1. I want to be more restrictive and make it return rows with at least two elements from list1.

Is this possible?

import pandas as pd
data = [
    ['tom steve orange', 'jane'],
    ['dave smith green', 'fran'],
    ['brit dave red', 'terri']
]
cols = ['A', 'B']
df = pd.DataFrame(data, columns=cols)

list1 = ['dave', 'red', 'blue']

df = df[df['A'].str.contains('|'.join(list1))].reset_index(drop=True)
print(df)

current result:

                  A      B
0  dave smith green   fran
1     brit dave red  terri

Desired result:

                  A      B
0     brit dave red  terri

CodePudding user response:

You can use set operations:

S = set(list1)

out = df[[len(set(l.split())&S)>=2 for l in df['A']]]

# or
# out = df[[len(S.intersection(l.split()))>=2 for l in df['A']]]

Output:

                 A      B
2    brit dave red  terri
  • Related