Home > database >  Filter dataframe rows based on multiple column values that can contain one or more null values in Pa
Filter dataframe rows based on multiple column values that can contain one or more null values in Pa

Time:03-12

I have a json in this format

{
    "userobject": ["Nike", "NY", "", "Actor", "Artist"]
}

I have a dataframe where all the columns have values except the Country column which is empty and is denoted by"" in the json above.

I want to fetch all rows from the dataframe that has similar values from the json. My code:

org = userobject[0]
region = userobject[1]
country = userobject[2]
title = userobject[3]
dept = userobject[4]
newdf = df[(df['Organization'] == org) & (df['Region'] == region) & (df['Country']== country)) & (df['Title'] == title) & (df['Department'] == dept)]

But this breaks my code and it only works if i remove this (df['Country']== country)) part. But I need to also use this clause as my dataframe should be filtered dynamically from the json and for some dataframes, Country value will be there. Can someone please help me with this

CodePudding user response:

You could use reduce to build the condition dynamically:

from functools import reduce

cols = ['Organization', 'Region', 'Country', 'Title', 'Department']
json_dict = {"userobject": ["Nike", "NY", "", "Actor", "Artist"]}

cond = reduce(
    lambda r, p: r & (df[p[0]] == p[1]),
    ((c, v) for c, v in zip(cols, json_dict["userobject"]) if v != ""),
    pd.Series(True, index=df.index)
)
newdf = df[cond]

reduce will start out with the initial value as r: The last argument, here a series that is filled with True (pd.Series(True, index=df.index)). Then it will successively "reduce" the given iterable, the second argument, via the function given as the first argument. The pairs in the iterable are (column name, corresponding value from userobject) ((c, v)) filtered by the value: if the value is "" it is filtered out. The lambda function takes the already build condition r and adds & (df[c] == v) to it.

CodePudding user response:

You have to first fill the null values with empty strings before creating the mask..further you can simplify your code by using eq to compare the columns with userobject list followed by all for reduction of boolean mask along the columns axis:

cols = ['Organization', 'Region', 'Country', 'Title', 'Department']
df[df[cols].fillna('').eq(userobject).all(1)]
  • Related