My dataframe looks like this:
Name Date Condition Status
A 17-12-2021
A 18-12-2022
A 19-12-2023
A 20-12-2023 0 Attack
A 21-12-2023
A 22-12-2024
B 17-12-2021
B 18-12-2022
B 19-12-2023
B 20-12-2023 2 Sprain
B 21-12-2023
B 22-12-2024
C 18-12-2022
C 19-12-2023
C 20-12-2023 1 Nausea
C 21-12-2023
C 22-12-2024
Modeling assumption is the status starts before actual showing the effect so what to backfill based on Condition using for map: dict_map = {0:2, 1:1, 2:2, 3:2, 4:2}... i.e if condition is 0, need to backfill by 2, where as if condition is 1 need to backfill by 1.
Ideal output is below. But, when I do bfill it fills all the way through. Anyway to condition bfill()?:
Name Date Condition Status
A 17-12-2021
A 18-12-2022 Attack
A 19-12-2023 Attack
A 20-12-2023 0 Attack
A 21-12-2023
A 22-12-2024
B 17-12-2021
B 18-12-2022 Sprain
B 19-12-2023 Sprain
B 20-12-2023 2 Sprain
B 21-12-2023
B 22-12-2024
C 18-12-2022
C 19-12-2023 Nausea
C 20-12-2023 1 Nausea
C 21-12-2023
C 22-12-2024
CodePudding user response:
You can compute a mask, using a reversed groupby.cumcount
and comparing to the bfill
"Condition" after mapping the values from the dictionary with map
. Use this mask with where
to keep only the backfilled Status within the desired limit:
dict_map = {0:2, 1:1, 2:2, 3:2, 4:2}
mask = (df[::-1].groupby(df.loc[::-1, 'Condition'].notna().cumsum())
.cumcount()
.le(df['Condition'].map(dict_map).bfill())
)
df['Status'] = df['Status'].bfill().where(mask)
Output:
Name Date Condition Status
0 A 17-12-2021 NaN NaN
1 A 18-12-2022 NaN Attack
2 A 19-12-2023 NaN Attack
3 A 20-12-2023 0.0 Attack
4 A 21-12-2023 NaN NaN
5 A 22-12-2024 NaN NaN
6 B 17-12-2021 NaN NaN
7 B 18-12-2022 NaN Sprain
8 B 19-12-2023 NaN Sprain
9 B 20-12-2023 2.0 Sprain
10 B 21-12-2023 NaN NaN
11 B 22-12-2024 NaN NaN
12 C 18-12-2022 NaN NaN
13 C 19-12-2023 NaN Nausea
14 C 20-12-2023 1.0 Nausea
15 C 21-12-2023 NaN NaN
16 C 22-12-2024 NaN NaN
Intermediates:
Name Date Condition Status cumsum/group cumcount condition_bfill mask status_bfill
0 A 17-12-2021 NaN NaN 3 3 2.0 False NaN
1 A 18-12-2022 NaN None 3 2 2.0 True Attack
2 A 19-12-2023 NaN None 3 1 2.0 True Attack
3 A 20-12-2023 0.0 Attack 3 0 2.0 True Attack
4 A 21-12-2023 NaN None 2 5 2.0 False NaN
5 A 22-12-2024 NaN None 2 4 2.0 False NaN
6 B 17-12-2021 NaN None 2 3 2.0 False NaN
7 B 18-12-2022 NaN None 2 2 2.0 True Sprain
8 B 19-12-2023 NaN None 2 1 2.0 True Sprain
9 B 20-12-2023 2.0 Sprain 2 0 2.0 True Sprain
10 B 21-12-2023 NaN None 1 4 1.0 False NaN
11 B 22-12-2024 NaN None 1 3 1.0 False NaN
12 C 18-12-2022 NaN None 1 2 1.0 False NaN
13 C 19-12-2023 NaN None 1 1 1.0 True Nausea
14 C 20-12-2023 1.0 Nausea 1 0 1.0 True Nausea
15 C 21-12-2023 NaN None 0 1 NaN False NaN
16 C 22-12-2024 NaN None 0 0 NaN False NaN
CodePudding user response:
Create helper column new
by Series.map
with DataFrame.iloc
for swap order, then create mask by helper groups by compare non missing values with cumulative sum and compare counter by GroupBy.cumcount
with forward filling missing values (because swapped order). Last back filling missing values and set NaN
s in Series.where
:
df1 = df.assign(new=df['Condition'].map(dict_map)).iloc[::-1]
m = df1.groupby(df1['new'].notna().cumsum()).cumcount().le(df1['new'].ffill()).iloc[::-1]
df['Status'] = df['Status'].bfill().where(m)
print (df)
Name Date Condition Status
0 A 17-12-2021 NaN NaN
1 A 18-12-2022 NaN Attack
2 A 19-12-2023 NaN Attack
3 A 20-12-2023 0.0 Attack
4 A 21-12-2023 NaN NaN
5 A 22-12-2024 NaN NaN
6 B 17-12-2021 NaN NaN
7 B 18-12-2022 NaN Sprain
8 B 19-12-2023 NaN Sprain
9 B 20-12-2023 2.0 Sprain
10 B 21-12-2023 NaN NaN
11 B 22-12-2024 NaN NaN
12 C 18-12-2022 NaN NaN
13 C 19-12-2023 NaN Nausea
14 C 20-12-2023 1.0 Nausea
15 C 21-12-2023 NaN NaN
16 C 22-12-2024 NaN NaN