Home > Net >  How to specify filter criteria for pandas data frame using dictionary
How to specify filter criteria for pandas data frame using dictionary

Time:12-01

I have a codebase with a bash script that runs a job that includes filtering the rows of a pandas data frame with multiple criteria, i.e.:

df.loc[(df['A'] > 5) & 
        df['B'].isin([1,5,9]) &
       (df['C'] == "blue")] 

Suppose I want this to be generalizable so that anyone using this codebase can specify their own list of filter criteria when running the bash script. I can imagine this would be relatively straightforward if we were just selecting specific values using a json string that is converted to a python dictionary:

filter_criteria={"B": [1,5,9], "C": ["blue"]}

for col, values in filter_criteria.items():
    df = df.loc[df[col].isin(values)]

The one thing I can't think of how to accommodate is > and < or potentially down the road, the ability to do more complex filtering (e.g. df.loc[(df['A']<5) | (df['A']>10)). Does anyone have any ideas on how the operator can be specified as part of the json dictionary?

CodePudding user response:

Create a query string and use query to filter your dataframe.

Suppose this dataframe:

>>> df
   A  B       C
0  1  2    blue
1  3  3     red
2  5  5  yellow
3  7  6    blue
4  9  9    blue
qs = "A > 5 & B.isin([1,5,9]) & C == 'blue'"
out = df.query(qs)
print(out)

# Output:
   A  B     C
4  9  9  blue
  • Related