Home > Mobile >  get values for a column for a historic day pandas dataframe
get values for a column for a historic day pandas dataframe

Time:03-04

I have a pandas dataframe with date index. The df is grouped by date and looks like this:

date            ID        time_spent      
22-04-2021      ABC           60
22-04-2021      XYZ           50
22-04-2021      PQR           55
23-04-2021      XYZ           40
23-04-2021      ABC           70
23-04-2021      XYZ           95
24-04-2021      ABC           15
24-04-2021      DEF           45
24-04-2021      PQR           65

what I need is to create a new column tracking the time spent by each user some (eg. 3) days ago (this part needs to be flexible please). Desired result:

date            ID        time_spent    New_col     
22-04-2021      ABC           60          NaN
22-04-2021      XYZ           50          NaN
22-04-2021      PQR           55          NaN
23-04-2021      XYZ           40          NaN
23-04-2021      ABC           70          NaN
23-04-2021      XYZ           95          NaN
24-04-2021      ABC           15          60
24-04-2021      DEF           45          NaN
24-04-2021      PQR           65          55

Could someone please help me? Please bear with me if this is an elementary question, I am new to Pandas.

CodePudding user response:

Create MultiIndex by append ID to DatetimeIndex in DataFrame.set_index, add N days to dates and join new Series by DataFrame.join:

N = 2

df['date'] = pd.to_datetime(df['date'])

df = df.set_index(['date','ID'])
df = (df.join(df['time_spent']
        .rename(lambda x: x   pd.Timedelta(N, 'days'), level=0)
        .rename('new'))
        .reset_index()
        )

print (df)
        date   ID  time_spent   new
0 2021-04-22  ABC          60   NaN
1 2021-04-22  PQR          55   NaN
2 2021-04-22  XYZ          50   NaN
3 2021-04-23  ABC          70   NaN
4 2021-04-23  XYZ          40   NaN
5 2021-04-23  XYZ          95   NaN
6 2021-04-24  ABC          15  60.0
7 2021-04-24  DEF          45   NaN
8 2021-04-24  PQR          65  55.0

Or:

N = 2

df['date'] = pd.to_datetime(df['date'])
df1 = (df.merge(df.assign(date = lambda x: x['date']   pd.Timedelta(N, 'days')), 
                on=['date','ID'], 
                how='left', suffixes=('','_'))
         .rename(columns={'time_spent_':'new'}))
print (df1)
        date   ID  time_spent   new
0 2021-04-22  ABC          60   NaN
1 2021-04-22  XYZ          50   NaN
2 2021-04-22  PQR          55   NaN
3 2021-04-23  XYZ          40   NaN
4 2021-04-23  ABC          70   NaN
5 2021-04-23  XYZ          95   NaN
6 2021-04-24  ABC          15  60.0
7 2021-04-24  DEF          45   NaN
8 2021-04-24  PQR          65  55.0
  • Related