Home > Blockchain >  Pandas: Using df.eval with string variables as conditional filtering
Pandas: Using df.eval with string variables as conditional filtering

Time:05-12

I have a simulated dataframe with one column created with:

df = pd.DataFrame({'A': np.arange(1,201)})

which is just a dataframe with numbers 1 to 200 with one column, "A". I would like to filter the dataframe based on a conditional statement like

df[df["A"] > 20]

but the column name, the boolean, >, and the value, 20, will have to be passed in as a string variable. So, I believe a dataframe.eval function in pandas should be used for this. I created a function called select_twenty for doing this. Here is it:

def select_twenty(input_df, column_name, boolean_arg, value):
    evaluated = input_df.eval(input_df[input_df[column_name]   boolean_arg   value])
    return evaluated

In the function above, input_df is the simulated dataframe above, column_name is the name of the chosen column and boolean_arg is the boolean, >, while value is the value 20. The last three arguments are passed in as strings in the function call:

select_twenty(df, "A", ">", "20")

When I call the function, it keeps giving me a UFuncTypeError. I have searched all over Google and do not know how to resolve it. I have not seen an example too where an eval in pandas was done this way. So, please, can someone help me with the filter? Thank you

CodePudding user response:

The error is relative to the inside of the eval argument, because you are trying to add the DataFrame column values with boolean_arg. What you are looking for is:

def select_twenty(input_df, column_name, boolean_arg, value):
    evaluated = input_df[input_df.eval(column_name   boolean_arg   value)]
    return evaluated
print(select_twenty(df, "A", ">", "20"))
       A
20    21
21    22
22    23
23    24
24    25
..   ...
195  196
196  197
197  198
198  199
199  200

[180 rows x 1 columns]
  • Related