I have a json in this format
{
"userobject": ["Nike", "NY", "", "Actor", "Artist"]
}
I have a dataframe where all the columns have values except the Country
column which is empty and is denoted by""
in the json above.
I want to fetch all rows from the dataframe that has similar values from the json. My code:
org = userobject[0]
region = userobject[1]
country = userobject[2]
title = userobject[3]
dept = userobject[4]
newdf = df[(df['Organization'] == org) & (df['Region'] == region) & (df['Country']== country)) & (df['Title'] == title) & (df['Department'] == dept)]
But this breaks my code and it only works if i remove this (df['Country']== country))
part. But I need to also use this clause as my dataframe should be filtered dynamically from the json and for some dataframes, Country value will be there. Can someone please help me with this
CodePudding user response:
You could use reduce
to build the condition dynamically:
from functools import reduce
cols = ['Organization', 'Region', 'Country', 'Title', 'Department']
json_dict = {"userobject": ["Nike", "NY", "", "Actor", "Artist"]}
cond = reduce(
lambda r, p: r & (df[p[0]] == p[1]),
((c, v) for c, v in zip(cols, json_dict["userobject"]) if v != ""),
pd.Series(True, index=df.index)
)
newdf = df[cond]
reduce
will start out with the initial value as r
: The last argument, here a series that is filled with True
(pd.Series(True, index=df.index)
). Then it will successively "reduce" the given iterable, the second argument, via the function given as the first argument. The pairs in the iterable are (column name, corresponding value from userobject) ((c, v)
) filtered by the value: if the value is ""
it is filtered out. The lambda
function takes the already build condition r
and adds & (df[c] == v)
to it.
CodePudding user response:
You have to first fill the null
values with empty strings before creating the mask..further you can simplify your code by using eq
to compare the columns with userobject
list followed by all
for reduction of boolean mask along the columns axis:
cols = ['Organization', 'Region', 'Country', 'Title', 'Department']
df[df[cols].fillna('').eq(userobject).all(1)]