Home > Blockchain >  How to copy the status to previous two date in pandas
How to copy the status to previous two date in pandas

Time:07-13

In a dataframe I have the ids and status in different dates. For a particular day,id1,id2, If the status is 1, I need to update previous two days status to 1. Dataframe is provided below I need to get the status_new from status column

data recreation code is below:

import pandas as pd
data=pd.DataFrame(data=[["xxx",7,"7-1-2022",0],["xxx",4,"7-1-2022",1],["xxx",7,"8-1-2022",0],["xxx",7,"9-1-2022",1],["yyy",4,"10-1-2022",0],["yyy",7,"10-1-2022",0]],columns=["id1","id2","date","status"])

enter image description here

I tried with below code, but it doesn't worked.

df['date']=pd.to_datetime(df['date']).dt.date

grouped=df.groupby([id1,id2])['date']

df['status']= grouped.shift().dt.date.where(grouped.diff().dt.days<3,df['status'])

CodePudding user response:

If datetimes are consecutive withjout duplicates per groups use:

df['date'] = pd.to_datetime(df['date'], dayfirst=True)

df = df.sort_values(['id1','id2','date'], ascending=[True, True, False])
df['status'] = df['status'].mask(df['status'].eq(0))

df['status'] = df.groupby(['id1','id2'])['status'].ffill(limit=2).fillna(0).astype(int)
df = df.sort_index()
print (df)

   id1  id2       date  status
0  xxx    7 2022-01-07       1
1  xxx    4 2022-01-07       1
2  xxx    7 2022-01-08       1
3  xxx    7 2022-01-09       1
4  yyy    4 2022-01-10       0
5  yyy    7 2022-01-10       0
    

If possible some non consecutive datetimes solution is possible with resample of maximal values per groups and days:

print (df)
   id1  id2       date  status
0  xxx    7   6-1-2022       0 <- per (xxx, 7) missing 7-1-2022
1  xxx    4   7-1-2022       1
2  xxx    7   8-1-2022       0
3  xxx    7   9-1-2022       1
4  yyy    4  10-1-2022       0
5  yyy    7  10-1-2022       0

df['date'] = pd.to_datetime(df['date'], dayfirst=True)

df1 = df.set_index('date').groupby(['id1','id2'])['status'].resample('d').max().reset_index()
df1['status'] = df1['status'].mask(df1['status'].eq(0))
df1['new status'] = df1.groupby(['id1','id2'])['status'].bfill(limit=2).fillna(0).astype(int)

print (df1)
   id1  id2       date  status  new status
0  xxx    4 2022-01-07     1.0           1
1  xxx    7 2022-01-06     NaN           0
2  xxx    7 2022-01-07     NaN           1
3  xxx    7 2022-01-08     NaN           1
4  xxx    7 2022-01-09     1.0           1
5  yyy    4 2022-01-10     NaN           0
6  yyy    7 2022-01-10     NaN           0

df = df.merge(df1.drop('status',axis=1), how='left', on=['id1','id2','date'])
print (df)
   id1  id2       date  status  new status
0  xxx    7 2022-01-06       0           0
1  xxx    4 2022-01-07       1           1
2  xxx    7 2022-01-08       0           1
3  xxx    7 2022-01-09       1           1
4  yyy    4 2022-01-10       0           0
5  yyy    7 2022-01-10       0           0
  • Related