Home > Net >  Is there a way to filter all columns of a pandas dataframe against a list?
Is there a way to filter all columns of a pandas dataframe against a list?

Time:04-23

I have a list of taxonomic classifications that are not uniform across their levels within the list and I want to filter all the columns of a data-frame against each item of the list to produce a singular sub-data-frame.

An example list would be

['Sk1','Sub1','Family 3','Clade C']

my actual list has 23 items.

An example data-frame that looks like this and my actual dataframe has 1100 rows:

Species Super kingdom Subkingdom Clade Class Family
Species 1 SK1 Sub1 Clade A Class I Family 1
Species 2 SK2 Sub2 Clade B Class II Family 2
Species 3 SK3 Sub3 Clade C Class III Family 3
Species 4 Sk4 Sub4 Clade D Class IV Family 4

So filtering the frame against the list against the list should give:

Species Super kingdom Subkingdom Clade Class Family
Species 1 SK1 Sub1 Clade A Class I Family 1
Species 3 SK3 Sub3 Clade C Class III Family 3

The closest I have come to some solutions is to use np and pandas as:

mask = np.column_stack([taxonframe[col].str.contains(Query, na=False) for col in taxonframe])
taxonframe.loc[mask.any(axis=1)]

But this one would need me to loop though the list of queries and make a dataframe for each item in the list and somehow merge 23 dataframes (This one I am trying to avoid).

OR

I could use itertuples() on each row and do a string search on each row as a list and then convert them back to a new dataframe-if there is nothing else I guess I can work with this. Is there an elegant pandas method to do this?

CodePudding user response:

Assuming other columns cannot contain the same strings from other columns (e.g. Clade column cannot contain Family 3, etc.), you can use isin any to create a boolean mask to filter df:

out = df[df.isin(['Sk1','Sub1','Family 3','Clade C']).any(axis=1)]

Output:

     Species Super kingdom Subkingdom    Clade      Class    Family
0  Species 1           SK1       Sub1  Clade A    Class I  Family 1
2  Species 3           SK3       Sub3  Clade C  Class III  Family 3
  • Related