Is there a way to filter all columns of a pandas dataframe against a list?-CodePudding

I have a list of taxonomic classifications that are not uniform across their levels within the list and I want to filter all the columns of a data-frame against each item of the list to produce a singular sub-data-frame.

An example list would be

['Sk1','Sub1','Family 3','Clade C']

my actual list has 23 items.

An example data-frame that looks like this and my actual dataframe has 1100 rows:

Species	Super kingdom	Subkingdom	Clade	Class	Family
Species 1	SK1	Sub1	Clade A	Class I	Family 1
Species 2	SK2	Sub2	Clade B	Class II	Family 2
Species 3	SK3	Sub3	Clade C	Class III	Family 3
Species 4	Sk4	Sub4	Clade D	Class IV	Family 4

So filtering the frame against the list against the list should give:

Species	Super kingdom	Subkingdom	Clade	Class	Family
Species 1	SK1	Sub1	Clade A	Class I	Family 1
Species 3	SK3	Sub3	Clade C	Class III	Family 3

The closest I have come to some solutions is to use np and pandas as:

mask = np.column_stack([taxonframe[col].str.contains(Query, na=False) for col in taxonframe])
taxonframe.loc[mask.any(axis=1)]

But this one would need me to loop though the list of queries and make a dataframe for each item in the list and somehow merge 23 dataframes (This one I am trying to avoid).

I could use itertuples() on each row and do a string search on each row as a list and then convert them back to a new dataframe-if there is nothing else I guess I can work with this. Is there an elegant pandas method to do this?

CodePudding user response：

Assuming other columns cannot contain the same strings from other columns (e.g. Clade column cannot contain Family 3, etc.), you can use isin any to create a boolean mask to filter df:

out = df[df.isin(['Sk1','Sub1','Family 3','Clade C']).any(axis=1)]

Output:

     Species Super kingdom Subkingdom    Clade      Class    Family
0  Species 1           SK1       Sub1  Clade A    Class I  Family 1
2  Species 3           SK3       Sub3  Clade C  Class III  Family 3