Home > Software engineering >  Performing operations on column with nan's without removing them
Performing operations on column with nan's without removing them

Time:12-12

I currently have a data frame like so:

treated control
9.5 9.6
10 5
6 0
6 6

I want to apply get a log 2 ratio between treated and control i.e log2(treated/control). However, the math.log2() ratio breaks, due to 0 values in the control column (a zero division). Ideally, I would like to get the log 2 ratio using method chaining, e.g a df.assign() and simply put nan's where it is not possible, like so:

treated control log_2_ratio
9.5 9.6 -0.00454
10 5 0.301
6 0 nan
6 6 0

I have managed to do this in an extremely round-about way, where I have:

  • made a column ratio which is treated/control
  • done new_df = df.dropna() on this dataframe
  • applied the log 2 ratio to this.
  • Left joined it back to it's the original df.

As always, any help is very much appreciated :)

CodePudding user response:

You need to replace the inf with nan:

df.assign(log_2_ratio=np.log2(df['treated'].div(df['control'])).replace(np.inf, np.nan))

Output:

   treated  control  log_2_ratio
0      9.5      9.6    -0.015107
1     10.0      5.0     1.000000
2      6.0      0.0          NaN
3      6.0      6.0     0.000000

CodePudding user response:

Stick with the numpy log functions and you'll get an inf in the cells where the divide doesn't work. That seems like a better choice than nan anyway.

>>> df["log_2_ratio"] = np.log2(df.treated/df.control)
>>> df
   treated  control  log_2_ratio
0      9.5      9.6    -0.015107
1     10.0      5.0     1.000000
2      6.0      0.0          inf
3      6.0      6.0     0.000000

CodePudding user response:

To avoid subsequent replacement you may go through an explicit condition (bearing in mind that multiplication/division operation with zero always result in 0).

df.assign(log_2_ratio=lambda x: np.where(x.treated * x.control, np.log2(x.treated/x.control), np.nan))

Out[22]: 
   treated  control  log_2_ratio
0      9.5      9.6    -0.015107
1     10.0      5.0     1.000000
2      6.0      0.0          NaN
3      6.0      6.0     0.000000
  • Related