Home > Software engineering >  Rows of Pandas DataFrame that contain all values of a list
Rows of Pandas DataFrame that contain all values of a list

Time:07-30

I have a DataFrame like so:

   C1  C2  C3  C4
1  A   B   C   E
2  C   D   E   F
3  A   C   A   B
4  A   A   B   G
5  B   nan C   E

And a list:

filt = [A, B, C]

What I need is a filter that keeps only the rows that have all the values from filt, in any order or position. So output here would be:

   C1  C2  C3  C4
1  A   B   C   E
3  A   C   A   B  

I've looked at previous questions like Check multiple columns for multiple values and return a dataframe. In that case, however, the OP is only partially matching. In my case, all values must be present, in any order, for the row to be matched.

CodePudding user response:

One solution

Use:

fs_filt = frozenset(filt)
mask = df.apply(frozenset, axis=1) >= fs_filt
res = df[mask]
print(res)

Output

  C1 C2 C3 C4
0  A  B  C  E
2  A  C  A  B

The idea is to convert each row to a fronzenset and then verify if a fronzenset of filt is a subset (>=) of the elements of the row.

  • Related