I have a column with chemistries and at some point I have the change of them. I want to ffill until the "changepoint" and bfill after it.
I have several change points like this:
date0 NaN
date1 chem1
date2 NaN
date3 NaN
date4 change
date5 NaN
date6 chem2
date7 chem2
date8 NaN
date9 change
date10 NaN
date11 chem3
date12 NaN
I would like to have it:
date0 chem1
date1 chem1
date2 chem1
date3 chem1
date4 change
date5 chem2
date6 chem2
date7 chem2
date8 chem2
date9 change
date10 chem3
date11 chem3
date12 chem3
any suggestions on the best approach here?
CodePudding user response:
Assuming 'col' your second column, use a custom group:
df['col'] = df.groupby(df['col'].eq('change').cumsum())['col'].apply(lambda s:s.bfill().ffill())
Output;
date col
0 date0 chem1
1 date1 chem1
2 date2 chem1
3 date3 chem1
4 date4 change
5 date5 chem2
6 date6 chem2
7 date7 chem2
8 date8 chem2
9 date9 change
10 date10 chem3
11 date11 chem3
12 date12 chem3
CodePudding user response:
df = pd.read_clipboard(header = None, names = ['a', 'b'])
Run an ffill
on the second column (I relabelled the columns as a
and b
):
step1 = df.b.ffill()
Run a condition on step1, and do a bfill after:
step1 = step1.mask(step1.eq('change') & df.b.isna()).bfill()
df.assign(b = step1)
a b
0 date0 chem1
1 date1 chem1
2 date2 chem1
3 date3 chem1
4 date4 change
5 date5 chem2
6 date6 chem2
7 date7 chem2
8 date8 chem2
9 date9 change
10 date10 chem3
11 date11 chem3
12 date12 chem3