I saw a question recently that was very intriguing. I tried to find a solution, but I couldn't get it to work. Basically, I'm trying to filter a specific column in a dataframe. Here's the setup.
import pandas as pd
import numpy as np
df = pd.DataFrame({'cd1' : ['PFE1', 'PFE25', np.nan, np.nan],
'cd2' : [np.nan, 'PFE28', 'PFE23', 'PFE14'],
'cd3' : ['PFE15', 'PFE2', 'PFE83', np.nan],
'cd4' : ['PFE25', np.nan, 'PFE39', 'PFE47'],
'cd5' : [np.nan, 'PFE21', 'PFE53', 'PFE15']})
df
df['combined'] = df.agg(lambda x: list(x.dropna()), axis=1)
spec_list = ['PFE15', 'PFE25']
df
That gives me this.
How can I filter for just the 'spec_list'? The final result would look like this.
CodePudding user response:
If you don't mind having an empty list where there is no match, you can do it like this:
spec_set = set(spec_list)
df.combined.map(lambda x: list(spec_set.intersection(x))))
Result:
0 [PFE15, PFE25]
1 [PFE25]
2 []
3 [PFE15]