Home > Enterprise >  Pandas apply with eval not giving NAN as result when NAN in column its calculating on
Pandas apply with eval not giving NAN as result when NAN in column its calculating on

Time:05-20

I have to support the ability for user to run any formula against a frame to produce a new column.

I may have a frame that looks like

    dim01    dim02   msr01
0   A        25      1.0
1   B        26      5.3
2   C        53      NaN

I interpret user code to allow them to run a formula using supported functions/ standard operators / other columns

So a formula might look like SQRT([msr01]*100 7)

I convert the user input to Python syntax so this would evaluate to something like

formula_str = '(math.sqrt((row.msr01*100) 7))'

I then apply it to my pandas dataframe like this

data_frame['msr002'] = data_frame.apply(lambda row: eval(formula_str), axis=1)

This was working good until I hit data with a NaN in a column used in the calculation. I noticed that when this case happens I get a frame like this in return.

    dim01    dim02   msr01   msr02
0   A        25      1.0     10.344
1   B        26      5.3     23.173
2   C        53      NaN     7.342

So it appears that the eval is not evaluating the NaN correctly.

I am using a lexer/parser to ensure that the user sent formula isnt dangerous and to convert from everyday user syntax to use python functions and make it work against pandas columns. Any advice on how to fix this?

Perhaps I should include something in the lambda that looks if any required column is NaN and just hardcode to Nan in that case? But that doesn't seem like the best solution to me.

I did see this question which is similar but didnt think it answered my exact need.

CodePudding user response:

So you can try with

df.msr01.mul(100).add(7)**0.5
Out[716]: 
0    10.34408
1    23.17326
2         NaN
Name: msr01, dtype: float64

Also with your original code

df.apply(lambda row: eval(formula_str), axis=1)
Out[714]: 
0    10.34408
1    23.17326
2         NaN
dtype: float64
  • Related