I have a simulated dataframe with one column created with:
df = pd.DataFrame({'A': np.arange(1,201)})
which is just a dataframe with numbers 1 to 200 with one column, "A". I would like to filter the dataframe based on a conditional statement like
df[df["A"] > 20]
but the column name, the boolean, >, and the value, 20, will have to be passed in as a string variable. So, I believe a dataframe.eval function in pandas should be used for this. I created a function called select_twenty for doing this. Here is it:
def select_twenty(input_df, column_name, boolean_arg, value):
evaluated = input_df.eval(input_df[input_df[column_name] boolean_arg value])
return evaluated
In the function above, input_df is the simulated dataframe above, column_name is the name of the chosen column and boolean_arg is the boolean, >, while value is the value 20. The last three arguments are passed in as strings in the function call:
select_twenty(df, "A", ">", "20")
When I call the function, it keeps giving me a UFuncTypeError. I have searched all over Google and do not know how to resolve it. I have not seen an example too where an eval in pandas was done this way. So, please, can someone help me with the filter? Thank you
CodePudding user response:
The error is relative to the
inside of the eval argument, because you are trying to add the DataFrame column values with boolean_arg
. What you are looking for is:
def select_twenty(input_df, column_name, boolean_arg, value):
evaluated = input_df[input_df.eval(column_name boolean_arg value)]
return evaluated
print(select_twenty(df, "A", ">", "20"))
A
20 21
21 22
22 23
23 24
24 25
.. ...
195 196
196 197
197 198
198 199
199 200
[180 rows x 1 columns]