Home > front end >  How to replace a value in a pandas df with an interpolation
How to replace a value in a pandas df with an interpolation

Time:09-17

I have a dataframe df that looks something like that

print(df)
x     outlier_flag
10    1
NaN   1
30    1
543  -1
50    1

I would like to substitute values flagged with outlier_flag==-1 with the interpolated values between row['A][i-1] and row['A][i 1], means I want to substitute the presented wrong value of 543 with 40.

What I could do is

df['x'] = df.apply(lambda row: np.nan if row['outlier_flag']==-1 else row['x'], axis=1)
df.interpolate(method='polynomial', order=3, inplace=True)

But I don't want to do this, because this would also interpolate nan values in df['x'] that are not marked with outlier_flag==-1 (see for that the second row)! Pure nan values, not marked by the flag, I want to keep as nan for a task later on.

So, is there a way to do the interpolation in place, even for a value like 543 that is not nan?

I tried doing

df['x'] = df.apply(lambda row: row['x'].interpolate(method='polynomial', order=3) if row['outlier_flag']==-1 else row['x'], axis=1)

But this throws an error, because only nan can be interpolated and 543 is int. Do you have a suggestion for me? Tnx

CodePudding user response:

use np.where:

df['x'] =  np.where(df['outlier_flag'] == -1, (df['x'].shift(1)   df['x'].shift(-1))/2, df['x'])
print(df)

      x  outlier_flag
0  10.0             1
1   NaN             1
2  30.0             1
3  40.0            -1
4  50.0             1

CodePudding user response:

This is a way that you can use interpolate() as you intend to.

You can first create a list containing the index of the rows which have -1 in outlier flag, and replace the values in x to be np.nan using loc:

incl = df.index[df['outlier_flag'] == -1].tolist()
df.loc[df.index.isin(incl), 'x'] = np.nan

>>> df
      x  outlier_flag
0  10.0             1
1   NaN             1
2  30.0             1
3   NaN            -1
4  50.0             1

Then, you can use np.where to check whether x isnull() and whether that particular index is in the list you created, and apply your interpolation:

df['x']= np.where( (df['x'].isnull()) & (df.index.isin(incl)), df['x'].interpolate(),df['x'])

Which prints:

      x  outlier_flag
0  10.0             1
1   NaN             1
2  30.0             1
3  40.0            -1
4  50.0             1
  • Related