I have a dataframe df
that looks something like that
print(df)
x outlier_flag
10 1
NaN 1
30 1
543 -1
50 1
I would like to substitute values flagged with outlier_flag==-1
with the interpolated values between row['A][i-1]
and row['A][i 1]
, means I want to substitute the presented wrong value of 543 with 40.
What I could do is
df['x'] = df.apply(lambda row: np.nan if row['outlier_flag']==-1 else row['x'], axis=1)
df.interpolate(method='polynomial', order=3, inplace=True)
But I don't want to do this, because this would also interpolate nan
values in df['x']
that are not marked with outlier_flag==-1
(see for that the second row)! Pure nan
values, not marked by the flag, I want to keep as nan
for a task later on.
So, is there a way to do the interpolation in place, even for a value like 543
that is not nan
?
I tried doing
df['x'] = df.apply(lambda row: row['x'].interpolate(method='polynomial', order=3) if row['outlier_flag']==-1 else row['x'], axis=1)
But this throws an error, because only nan
can be interpolated and 543
is int
. Do you have a suggestion for me? Tnx
CodePudding user response:
use np.where
:
df['x'] = np.where(df['outlier_flag'] == -1, (df['x'].shift(1) df['x'].shift(-1))/2, df['x'])
print(df)
x outlier_flag
0 10.0 1
1 NaN 1
2 30.0 1
3 40.0 -1
4 50.0 1
CodePudding user response:
This is a way that you can use interpolate()
as you intend to.
You can first create a list containing the index of the rows which have -1 in outlier flag, and replace the values in x to be np.nan
using loc
:
incl = df.index[df['outlier_flag'] == -1].tolist()
df.loc[df.index.isin(incl), 'x'] = np.nan
>>> df
x outlier_flag
0 10.0 1
1 NaN 1
2 30.0 1
3 NaN -1
4 50.0 1
Then, you can use np.where
to check whether x isnull()
and whether that particular index is in the list you created, and apply your interpolation:
df['x']= np.where( (df['x'].isnull()) & (df.index.isin(incl)), df['x'].interpolate(),df['x'])
Which prints:
x outlier_flag
0 10.0 1
1 NaN 1
2 30.0 1
3 40.0 -1
4 50.0 1