I have a DataFrame like so:
C1 C2 C3 C4
1 A B C E
2 C D E F
3 A C A B
4 A A B G
5 B nan C E
And a list:
filt = [A, B, C]
What I need is a filter that keeps only the rows that have all the values from filt
, in any order or position. So output here would be:
C1 C2 C3 C4
1 A B C E
3 A C A B
I've looked at previous questions like Check multiple columns for multiple values and return a dataframe. In that case, however, the OP is only partially matching. In my case, all values must be present, in any order, for the row to be matched.
CodePudding user response:
One solution
Use:
fs_filt = frozenset(filt)
mask = df.apply(frozenset, axis=1) >= fs_filt
res = df[mask]
print(res)
Output
C1 C2 C3 C4
0 A B C E
2 A C A B
The idea is to convert each row to a fronzenset
and then verify if a fronzenset of filt
is a subset (>=
) of the elements of the row.