Home > Software design >  Fill only last among of consecutive NaN in Pandas by mean of previous and next valid values
Fill only last among of consecutive NaN in Pandas by mean of previous and next valid values

Time:06-02

Fill only last among of consecutive NaN in Pandas by mean of previous and next valid values. If one NaN, then fill with mean of next and previous. If two consecutive NaN, impute second one with mean of next and previous valid values.

Series:

enter image description here

expected output:

enter image description here

CodePudding user response:

Idea is remove consecutive missing values without last, then use interpolate and assign back last missing value by condition:

m =  df['header'].isna()
mask = m & ~m.shift(-1, fill_value=False)

df.loc[mask, 'header'] = df.loc[mask | ~m, 'header'].interpolate()
print (df)
    header
0     10.0
1     20.0
2     20.0
3     20.0
4     30.0
5      NaN
6     35.0
7     40.0
8     10.0
9      NaN
10     NaN
11    30.0
12    50.0

Details:

print (df.assign(m=m, mask=mask))
    header      m   mask
0     10.0  False  False
1     20.0  False  False
2     20.0   True   True
3     20.0  False  False
4     30.0  False  False
5      NaN   True  False
6     35.0   True   True
7     40.0  False  False
8     10.0  False  False
9      NaN   True  False
10     NaN   True  False
11    30.0   True   True
12    50.0  False  False


print (df.loc[mask | ~m, 'header'])
0     10.0
1     20.0
2      NaN
3     20.0
4     30.0
6      NaN
7     40.0
8     10.0
11     NaN
12    50.0
Name: header, dtype: float64

Solution for interpolate per groups is:

df.loc[mask, 'header'] = df.loc[mask | ~m, 'header'].groupby(df['groups'])
                                                    .transform(lambda x: x.interpolate())

CodePudding user response:

You can try:

s = df['header']
m = s.isna()
df['header'] = s.ffill().add(s.bfill()).div(2).mask(m&m.shift(-1, fill_value=False))

output and intermediates:

    header  output  ffill  bfill      m  m&m.shift(-1)
0     10.0    10.0   10.0   10.0  False          False
1     20.0    20.0   20.0   20.0  False          False
2      NaN    20.0   20.0   20.0   True          False
3     20.0    20.0   20.0   20.0  False          False
4     30.0    30.0   30.0   30.0  False          False
5      NaN     NaN   30.0   40.0   True           True
6      NaN    35.0   30.0   40.0   True          False
7     40.0    40.0   40.0   40.0  False          False
8     10.0    10.0   10.0   10.0  False          False
9      NaN     NaN   10.0   50.0   True           True
10     NaN     NaN   10.0   50.0   True           True
11     NaN    30.0   10.0   50.0   True          False
12    50.0    50.0   50.0   50.0  False          False
  • Related