I need to filter a data frame with a dictionary, constructed with the key being the column name and the list of value being the value that I want to filter:
dict_filter = {'A':'high', 'B':'medium', 'C':['bottom', 'high']}
# the dataframe is like
df = pd.DataFrame({'id':[1,2], 'A':['high', 'high'], 'B':['high','medium'],'C':['high','bottom']})
the dataframe is like
id A B C
0 1 'high' 'high' 'high'
1 1 'high' 'medium' 'bottom'
I would like to get a dataframe filtered as following:
id A B C
1 1 'high' 'medium' 'bottom'
I tried the following method but it doesnt work with the fact that the last value of the dictionary is a list
df.loc[(df[list(dict_filter)] == pd.Series(dict_filter )).all(axis=1)]
Any suggestions ?
CodePudding user response:
bool_arr = []
for k, v in dict_filter.items():
bool_arr.append(df.loc[:, k].isin(pd.Series(v)))
df.loc[pd.concat(bool_arr, axis=1).all(axis=1)]
# id A B C
# 1 2 high medium bottom
CodePudding user response:
Solution
We can use isin
to create a boolean mask, but before that you have to make sure that all the values in the dict_filter
are list of strings
d = {k: np.atleast_1d(v) for k, v in dict_filter.items()}
df[df[list(d)].isin(d).all(1)]
id A B C
1 2 high medium bottom
CodePudding user response:
One-liner:
filtered = df[df.apply(lambda col: col.isin(pd.Series(dict_filter.get(col.name, [])))).all(axis=1)]
Output:
>>> filtered
A B C
id
2 high medium bottom
CodePudding user response:
You can use:
d = {k:v if isinstance (v, list) else [v]
for k,v in dict_filter.items()}
mask = (df[list(dict_filter)]
.apply(lambda c: c.isin(d[c.name]))
.all(1)
)
df2 = df[mask]
Output:
id A B C
1 2 high medium bottom