Home > Software engineering >  Replacing np.inf and -np.inf values with maximum and minimum of a column in pandas dataframe?
Replacing np.inf and -np.inf values with maximum and minimum of a column in pandas dataframe?

Time:09-23

I have some -np.inf and np.inf values in my dataframe. I would like to replace them with the respective minimum and maximum values of the dataframe.

I thought it should be possible with something like this:

df.replace([np.inf, -np.inf], [df.max, df.min], axis=1, inplace = True)

But it didn't work. I had the idea because I can use something similar to replace nans with fillna().

What is an effective way to go about it?

Is there a numpy version?

Thanks for any tips!

CodePudding user response:

You can use .replace(), as follows:

df = df.replace({np.inf: df[np.isfinite(df)].max().max(), 
                -np.inf: df[np.isfinite(df)].min().min()})

Here, df[np.isfinite(df)].max().max() and df[np.isfinite(df)].min().min() are the respective finite maximum and minimum of the dataframe. We replace np.inf and -np.inf with them respectively.

Demo

Data Input

df = pd.DataFrame({'Col1': [np.inf, -2000.0, 345.0], 'Col2': [1234.0, -np.inf, 890.0]})


     Col1    Col2
0     inf  1234.0
1 -2000.0    -inf
2   345.0   890.0

Output:

print(df)

     Col1    Col2
0  1234.0  1234.0
1 -2000.0 -2000.0
2   345.0   890.0

Edit

If you want to replace with min max of the particular column instead of the min max over the global dataframe, you can use nested dict in .replace(), as follows:

min_max_dict = {np.inf: df[np.isfinite(df)].max(), -np.inf: df[np.isfinite(df)].min()}

df = df.replace({col: min_max_dict for col in df.columns})

Demo

Data Input

df = pd.DataFrame({'Col1': [np.inf, -2000.0, 345.0], 'Col2': [1234.0, -np.inf, 890.0]})


     Col1    Col2
0     inf  1234.0
1 -2000.0    -inf
2   345.0   890.0

Output:

print(df)

     Col1    Col2
0   345.0  1234.0
1 -2000.0   890.0
2   345.0   890.0

inf and -inf are replaced by the respective max, min of the column accordingly.

CodePudding user response:

You can use df.mask() which takes boolean series or dataframes, which you can then get with np.isinf for example.

>>> df
     0
0  0.0
1  1.0
2  inf
3  2.0
4 -inf
5  3.0
>>> posinf = df.gt(0) & df.transform(np.isinf)
>>> neginf = df.lt(0) & df.transform(np.isinf)
>>> df = df.mask(posinf, df.mask(posinf).max().max())
>>> df = df.mask(neginf, df.mask(neginf).min().min())
>>> df
     0
0  0.0
1  1.0
2  3.0
3  2.0
4  0.0
5  3.0

I’m masking twice, the inner call has no replacing value so it replaces with NaN. This allows to then compute the min/max bounds.

CodePudding user response:

You can compute masks for inf/-inf and replace with the values you want:

import numpy as np

m1 = df.eq(np.inf)
m2 = df.eq(-np.inf)

df.mask(m1, df[~m1].max().max()).mask(m2, df[~m2].min().min()))

NB. this will replace the inf with the min/max for the whole dataframe, if you want to take the min/max per column:

df.mask(m1, df[~m1].max(), axis=1).mask(m2, df[~m2].min(), axis=1)

input:

   col
0  inf
1  1.0
2 -inf
3  2.0
4  NaN

output:

   col
0  2.0
1  1.0
2  1.0
3  2.0
4  NaN
  • Related