I have some -np.inf
and np.inf
values in my dataframe.
I would like to replace them with the respective minimum and maximum values of the dataframe.
I thought it should be possible with something like this:
df.replace([np.inf, -np.inf], [df.max, df.min], axis=1, inplace = True)
But it didn't work. I had the idea because I can use something similar to replace nans with fillna()
.
What is an effective way to go about it?
Is there a numpy version?
Thanks for any tips!
CodePudding user response:
You can use .replace()
, as follows:
df = df.replace({np.inf: df[np.isfinite(df)].max().max(),
-np.inf: df[np.isfinite(df)].min().min()})
Here, df[np.isfinite(df)].max().max()
and df[np.isfinite(df)].min().min()
are the respective finite maximum and minimum of the dataframe. We replace np.inf
and -np.inf
with them respectively.
Demo
Data Input
df = pd.DataFrame({'Col1': [np.inf, -2000.0, 345.0], 'Col2': [1234.0, -np.inf, 890.0]})
Col1 Col2
0 inf 1234.0
1 -2000.0 -inf
2 345.0 890.0
Output:
print(df)
Col1 Col2
0 1234.0 1234.0
1 -2000.0 -2000.0
2 345.0 890.0
Edit
If you want to replace with min max of the particular column instead of the min max over the global dataframe, you can use nested dict in .replace()
, as follows:
min_max_dict = {np.inf: df[np.isfinite(df)].max(), -np.inf: df[np.isfinite(df)].min()}
df = df.replace({col: min_max_dict for col in df.columns})
Demo
Data Input
df = pd.DataFrame({'Col1': [np.inf, -2000.0, 345.0], 'Col2': [1234.0, -np.inf, 890.0]})
Col1 Col2
0 inf 1234.0
1 -2000.0 -inf
2 345.0 890.0
Output:
print(df)
Col1 Col2
0 345.0 1234.0
1 -2000.0 890.0
2 345.0 890.0
inf
and -inf
are replaced by the respective max, min of the column accordingly.
CodePudding user response:
You can use df.mask()
which takes boolean series or dataframes, which you can then get with np.isinf
for example.
>>> df
0
0 0.0
1 1.0
2 inf
3 2.0
4 -inf
5 3.0
>>> posinf = df.gt(0) & df.transform(np.isinf)
>>> neginf = df.lt(0) & df.transform(np.isinf)
>>> df = df.mask(posinf, df.mask(posinf).max().max())
>>> df = df.mask(neginf, df.mask(neginf).min().min())
>>> df
0
0 0.0
1 1.0
2 3.0
3 2.0
4 0.0
5 3.0
I’m masking twice, the inner call has no replacing value so it replaces with NaN
. This allows to then compute the min/max bounds.
CodePudding user response:
You can compute masks
for inf/-inf and replace with the values you want:
import numpy as np
m1 = df.eq(np.inf)
m2 = df.eq(-np.inf)
df.mask(m1, df[~m1].max().max()).mask(m2, df[~m2].min().min()))
NB. this will replace the inf with the min/max for the whole dataframe, if you want to take the min/max per column:
df.mask(m1, df[~m1].max(), axis=1).mask(m2, df[~m2].min(), axis=1)
input:
col
0 inf
1 1.0
2 -inf
3 2.0
4 NaN
output:
col
0 2.0
1 1.0
2 1.0
3 2.0
4 NaN