Check if pandas column contains all values from any list of lists-CodePudding

I have a data frame like:

df= pd.DataFrame({'A': [1,2,3,4,5], 'B': ['a b c', 'a cf', 'bdf','az c', 'ab f']})

List of lists to be matched:

lsts = [['a','c'], ['b', 'd']]

I need to get all the rows from df where column 'B' contains all the elements from any of the sub-lists. It should contain either both ('a' and 'b') or contain both ('b' and 'd'). So, the output should look like

What will be an optimal way to do this? Thanks.

CodePudding user response：

You can use .str.findall to convert characters in column B to lists then loop the lsts to use set.issubset to check if lst is subset of lists in column B. At last use np.logical_or to reduce the boolean list.

import numpy as np

lsts = [['a','c'], ['b', 'd']]

bs = df['B'].str.findall(r'\w').apply(set)

m = np.logical_or.reduce([bs.apply(set(lst).issubset) for lst in lsts])

print(df[m])

   A      B
0  1  a b c
1  2   a cf
2  3    bdf
3  4   az c

CodePudding user response：

You can use set operations:

lsts = [['a','c'], ['b', 'd']]
sets = [set(l) for l in lsts]

df[[any(S.issubset(x) for S in sets) for x in df['B']]]

Output:

   A      B
0  1  a b c
1  2   a cf
2  3    bdf
3  4   az c