Performing operations on column with nan's without removing them-CodePudding

I currently have a data frame like so:

treated	control
9.5	9.6
10	5
6	0
6	6

I want to apply get a log 2 ratio between treated and control i.e log2(treated/control). However, the math.log2() ratio breaks, due to 0 values in the control column (a zero division). Ideally, I would like to get the log 2 ratio using method chaining, e.g a df.assign() and simply put nan's where it is not possible, like so:

treated	control	log_2_ratio
9.5	9.6	-0.00454
10	5	0.301
6	0	nan
6	6	0

I have managed to do this in an extremely round-about way, where I have:

made a column ratio which is treated/control
done new_df = df.dropna() on this dataframe
applied the log 2 ratio to this.
Left joined it back to it's the original df.

As always, any help is very much appreciated :)

CodePudding user response：

You need to replace the inf with nan:

df.assign(log_2_ratio=np.log2(df['treated'].div(df['control'])).replace(np.inf, np.nan))

Output:

   treated  control  log_2_ratio
0      9.5      9.6    -0.015107
1     10.0      5.0     1.000000
2      6.0      0.0          NaN
3      6.0      6.0     0.000000

CodePudding user response：

Stick with the numpy log functions and you'll get an inf in the cells where the divide doesn't work. That seems like a better choice than nan anyway.

>>> df["log_2_ratio"] = np.log2(df.treated/df.control)
>>> df
   treated  control  log_2_ratio
0      9.5      9.6    -0.015107
1     10.0      5.0     1.000000
2      6.0      0.0          inf
3      6.0      6.0     0.000000

CodePudding user response：

To avoid subsequent replacement you may go through an explicit condition (bearing in mind that multiplication/division operation with zero always result in 0).

df.assign(log_2_ratio=lambda x: np.where(x.treated * x.control, np.log2(x.treated/x.control), np.nan))

Out[22]: 
   treated  control  log_2_ratio
0      9.5      9.6    -0.015107
1     10.0      5.0     1.000000
2      6.0      0.0          NaN
3      6.0      6.0     0.000000