Home > Net >  Forward fill ones for x days in python binary DataFrame
Forward fill ones for x days in python binary DataFrame

Time:12-05

I have a binary DataFrame df1 with ones and zeros and I would like to forward fill ones for 2 additional days for each column as in df2:

df1
    Date        ID1 ID2 ID3
0   2021-01-01  0   1   0
1   2021-01-02  0   0   0
2   2021-01-03  1   0   1
3   2021-01-04  0   0   0
4   2021-01-05  0   0   1
5   2021-01-06  0   0   0
6   2021-01-07  0   0   0
7   2021-01-08  0   0   0

df2
    Date        ID1 ID2 ID3
0   2021-01-01  0   1   0
1   2021-01-02  0   1   0
2   2021-01-03  1   1   1
3   2021-01-04  1   0   1
4   2021-01-05  1   0   1
5   2021-01-06  0   0   1
6   2021-01-07  0   0   1
7   2021-01-08  0   0   0

For reproducibility:

import pandas as pd
df1 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08'],
    'ID1':[0,0,1,0,0,0,0,0], 
    'ID2':[1,0,0,0,0,0,0,0],
    'ID3':[0,0,1,0,1,0,0,0]})

Is there an efficient and easy way to deal with that? Thanks a lot.

CodePudding user response:

As you have no missing days, use shift:

out = df1.set_index('Date').astype(bool)
out = (out | out.shift(1, fill_value=False) | out.shift(2, fill_value=False)) \
          .astype(int).reset_index()
print(out)

# Output:
        Date  ID1  ID2  ID3
0 2021-01-01    0    1    0
1 2021-01-02    0    1    0
2 2021-01-03    1    1    1
3 2021-01-04    1    0    1
4 2021-01-05    1    0    1
5 2021-01-06    0    0    1
6 2021-01-07    0    0    1
7 2021-01-08    0    0    0

Update

In case I also want to assign a one before, how would I need to adjust?

>>> df1.set_index('Date').rolling(4, min_periods=1, center=True).max() \
       .astype(int).reset_index()

        Date  ID1  ID2  ID3
0 2021-01-01    0    1    0
1 2021-01-02    1    1    1
2 2021-01-03    1    1    1
3 2021-01-04    1    0    1
4 2021-01-05    1    0    1
5 2021-01-06    0    0    1
6 2021-01-07    0    0    1
7 2021-01-08    0    0    0

CodePudding user response:

Just apply a rolling with max:

df1.set_index('Date').rolling(3, min_periods=1).max().convert_dtypes().reset_index()

If there can be missing dates:

(df1.assign(Date=pd.to_datetime(df1['Date']))
    .set_index('Date')
    .rolling('3d')
    .max()
    .convert_dtypes()
    .reset_index()
)

output:

        Date  ID1  ID2  ID3
0 2021-01-01    0    1    0
1 2021-01-02    0    1    0
2 2021-01-03    1    1    1
3 2021-01-04    1    0    1
4 2021-01-05    1    0    1
5 2021-01-06    0    0    1
6 2021-01-07    0    0    1
7 2021-01-08    0    0    0
  • Related