In a very large dataset I want to fill in nan values in between two values. The dataset looks like this:
df = pd.DataFrame ({'col1':[1,2,3,4,5,6,7,8,9,10,11,12],
'col2':[100,np.nan,100,np.nan,np.nan,999,999,np.nan,100,np.nan,np.nan,100]})
I want to only fill in those nan values that occur in between the 999 and 100. Expected output is:
df['col2'] = [100,np.nan,100,500,500,999,999,500,100,np.nan,np.nan,100]
CodePudding user response:
If is possible test forward and backfilled missing values by 999
and 100
use:
s1 = df['col2'].ffill()
s2 = df['col2'].bfill()
df.loc[(s1.eq(999) & s2.eq(100)) | (s1.eq(100) & s2.eq(999)), 'col2'] = 500
print(df)
col1 col2
0 1 100.0
1 2 NaN
2 3 100.0
3 4 500.0
4 5 500.0
5 6 999.0
6 7 999.0
7 8 500.0
8 9 100.0
9 10 NaN
10 11 NaN
11 12 100.0