Home > Back-end >  How to filter a column of a dataframe?
How to filter a column of a dataframe?

Time:06-01

I saw a question recently that was very intriguing. I tried to find a solution, but I couldn't get it to work. Basically, I'm trying to filter a specific column in a dataframe. Here's the setup.

import pandas as pd
import numpy as np

df = pd.DataFrame({'cd1' : ['PFE1', 'PFE25', np.nan, np.nan], 
                   'cd2' : [np.nan, 'PFE28', 'PFE23', 'PFE14'], 
                   'cd3' : ['PFE15', 'PFE2', 'PFE83', np.nan], 
                   'cd4' : ['PFE25', np.nan, 'PFE39', 'PFE47'], 
                   'cd5' : [np.nan, 'PFE21', 'PFE53', 'PFE15']})

df

df['combined'] = df.agg(lambda x: list(x.dropna()), axis=1)
spec_list = ['PFE15', 'PFE25']
df

That gives me this.

enter image description here

How can I filter for just the 'spec_list'? The final result would look like this.

enter image description here

CodePudding user response:

If you don't mind having an empty list where there is no match, you can do it like this:

spec_set = set(spec_list)
df.combined.map(lambda x: list(spec_set.intersection(x))))

Result:

0    [PFE15, PFE25]
1           [PFE25]
2                []
3           [PFE15]
  • Related