I currently have a data frame like so:
treated | control |
---|---|
9.5 | 9.6 |
10 | 5 |
6 | 0 |
6 | 6 |
I want to apply get a log 2 ratio between treated and control i.e log2(treated/control)
. However, the math.log2()
ratio breaks, due to 0 values in the control column (a zero division). Ideally, I would like to get the log 2 ratio using method chaining, e.g a df.assign()
and simply put nan's where it is not possible, like so:
treated | control | log_2_ratio |
---|---|---|
9.5 | 9.6 | -0.00454 |
10 | 5 | 0.301 |
6 | 0 | nan |
6 | 6 | 0 |
I have managed to do this in an extremely round-about way, where I have:
- made a column
ratio
which istreated/control
- done
new_df = df.dropna()
on this dataframe - applied the log 2 ratio to this.
- Left joined it back to it's the original df.
As always, any help is very much appreciated :)
CodePudding user response:
You need to replace the inf
with nan
:
df.assign(log_2_ratio=np.log2(df['treated'].div(df['control'])).replace(np.inf, np.nan))
Output:
treated control log_2_ratio
0 9.5 9.6 -0.015107
1 10.0 5.0 1.000000
2 6.0 0.0 NaN
3 6.0 6.0 0.000000
CodePudding user response:
Stick with the numpy log functions and you'll get an inf
in the cells where the divide doesn't work. That seems like a better choice than nan
anyway.
>>> df["log_2_ratio"] = np.log2(df.treated/df.control)
>>> df
treated control log_2_ratio
0 9.5 9.6 -0.015107
1 10.0 5.0 1.000000
2 6.0 0.0 inf
3 6.0 6.0 0.000000
CodePudding user response:
To avoid subsequent replacement you may go through an explicit condition (bearing in mind that multiplication/division operation with zero always result in 0
).
df.assign(log_2_ratio=lambda x: np.where(x.treated * x.control, np.log2(x.treated/x.control), np.nan))
Out[22]:
treated control log_2_ratio
0 9.5 9.6 -0.015107
1 10.0 5.0 1.000000
2 6.0 0.0 NaN
3 6.0 6.0 0.000000