Home > Software design >  Pandas condition backfill in pandas
Pandas condition backfill in pandas

Time:12-09

My dataframe looks like this:

Name    Date    Condition   Status
A   17-12-2021      
A   18-12-2022      
A   19-12-2023      
A   20-12-2023  0   Attack
A   21-12-2023      
A   22-12-2024      
B   17-12-2021      
B   18-12-2022      
B   19-12-2023      
B   20-12-2023  2   Sprain
B   21-12-2023      
B   22-12-2024      
C   18-12-2022      
C   19-12-2023      
C   20-12-2023  1   Nausea
C   21-12-2023      
C   22-12-2024  

Modeling assumption is the status starts before actual showing the effect so what to backfill based on Condition using for map: dict_map = {0:2, 1:1, 2:2, 3:2, 4:2}... i.e if condition is 0, need to backfill by 2, where as if condition is 1 need to backfill by 1.

Ideal output is below. But, when I do bfill it fills all the way through. Anyway to condition bfill()?:

Name    Date    Condition   Status
A   17-12-2021      
A   18-12-2022      Attack
A   19-12-2023      Attack
A   20-12-2023  0   Attack
A   21-12-2023      
A   22-12-2024      
B   17-12-2021      
B   18-12-2022      Sprain
B   19-12-2023      Sprain
B   20-12-2023  2   Sprain
B   21-12-2023      
B   22-12-2024      
C   18-12-2022      
C   19-12-2023      Nausea
C   20-12-2023  1   Nausea
C   21-12-2023      
C   22-12-2024      

CodePudding user response:

You can compute a mask, using a reversed groupby.cumcount and comparing to the bfill "Condition" after mapping the values from the dictionary with map. Use this mask with where to keep only the backfilled Status within the desired limit:

dict_map = {0:2, 1:1, 2:2, 3:2, 4:2}

mask = (df[::-1].groupby(df.loc[::-1, 'Condition'].notna().cumsum())
        .cumcount()
        .le(df['Condition'].map(dict_map).bfill())
       )

df['Status'] = df['Status'].bfill().where(mask)

Output:

   Name        Date  Condition  Status
0     A  17-12-2021        NaN     NaN
1     A  18-12-2022        NaN  Attack
2     A  19-12-2023        NaN  Attack
3     A  20-12-2023        0.0  Attack
4     A  21-12-2023        NaN     NaN
5     A  22-12-2024        NaN     NaN
6     B  17-12-2021        NaN     NaN
7     B  18-12-2022        NaN  Sprain
8     B  19-12-2023        NaN  Sprain
9     B  20-12-2023        2.0  Sprain
10    B  21-12-2023        NaN     NaN
11    B  22-12-2024        NaN     NaN
12    C  18-12-2022        NaN     NaN
13    C  19-12-2023        NaN  Nausea
14    C  20-12-2023        1.0  Nausea
15    C  21-12-2023        NaN     NaN
16    C  22-12-2024        NaN     NaN

Intermediates:

   Name        Date  Condition  Status  cumsum/group  cumcount  condition_bfill   mask status_bfill
0     A  17-12-2021        NaN     NaN             3         3              2.0  False          NaN
1     A  18-12-2022        NaN    None             3         2              2.0   True       Attack
2     A  19-12-2023        NaN    None             3         1              2.0   True       Attack
3     A  20-12-2023        0.0  Attack             3         0              2.0   True       Attack
4     A  21-12-2023        NaN    None             2         5              2.0  False          NaN
5     A  22-12-2024        NaN    None             2         4              2.0  False          NaN
6     B  17-12-2021        NaN    None             2         3              2.0  False          NaN
7     B  18-12-2022        NaN    None             2         2              2.0   True       Sprain
8     B  19-12-2023        NaN    None             2         1              2.0   True       Sprain
9     B  20-12-2023        2.0  Sprain             2         0              2.0   True       Sprain
10    B  21-12-2023        NaN    None             1         4              1.0  False          NaN
11    B  22-12-2024        NaN    None             1         3              1.0  False          NaN
12    C  18-12-2022        NaN    None             1         2              1.0  False          NaN
13    C  19-12-2023        NaN    None             1         1              1.0   True       Nausea
14    C  20-12-2023        1.0  Nausea             1         0              1.0   True       Nausea
15    C  21-12-2023        NaN    None             0         1              NaN  False          NaN
16    C  22-12-2024        NaN    None             0         0              NaN  False          NaN

CodePudding user response:

Create helper column new by Series.map with DataFrame.iloc for swap order, then create mask by helper groups by compare non missing values with cumulative sum and compare counter by GroupBy.cumcount with forward filling missing values (because swapped order). Last back filling missing values and set NaNs in Series.where:

df1 = df.assign(new=df['Condition'].map(dict_map)).iloc[::-1]

m = df1.groupby(df1['new'].notna().cumsum()).cumcount().le(df1['new'].ffill()).iloc[::-1]
    
df['Status'] = df['Status'].bfill().where(m)
print (df)
   Name        Date  Condition  Status
0     A  17-12-2021        NaN     NaN
1     A  18-12-2022        NaN  Attack
2     A  19-12-2023        NaN  Attack
3     A  20-12-2023        0.0  Attack
4     A  21-12-2023        NaN     NaN
5     A  22-12-2024        NaN     NaN
6     B  17-12-2021        NaN     NaN
7     B  18-12-2022        NaN  Sprain
8     B  19-12-2023        NaN  Sprain
9     B  20-12-2023        2.0  Sprain
10    B  21-12-2023        NaN     NaN
11    B  22-12-2024        NaN     NaN
12    C  18-12-2022        NaN     NaN
13    C  19-12-2023        NaN  Nausea
14    C  20-12-2023        1.0  Nausea
15    C  21-12-2023        NaN     NaN
16    C  22-12-2024        NaN     NaN
  • Related