I have a pandas dataframe with date index. The df is grouped by date and looks like this:
date ID time_spent
22-04-2021 ABC 60
22-04-2021 XYZ 50
22-04-2021 PQR 55
23-04-2021 XYZ 40
23-04-2021 ABC 70
23-04-2021 XYZ 95
24-04-2021 ABC 15
24-04-2021 DEF 45
24-04-2021 PQR 65
what I need is to create a new column tracking the time spent by each user some (eg. 3) days ago (this part needs to be flexible please). Desired result:
date ID time_spent New_col
22-04-2021 ABC 60 NaN
22-04-2021 XYZ 50 NaN
22-04-2021 PQR 55 NaN
23-04-2021 XYZ 40 NaN
23-04-2021 ABC 70 NaN
23-04-2021 XYZ 95 NaN
24-04-2021 ABC 15 60
24-04-2021 DEF 45 NaN
24-04-2021 PQR 65 55
Could someone please help me? Please bear with me if this is an elementary question, I am new to Pandas.
CodePudding user response:
Create MultiIndex
by append ID
to DatetimeIndex
in DataFrame.set_index
, add N
days to dates and join new Series by DataFrame.join
:
N = 2
df['date'] = pd.to_datetime(df['date'])
df = df.set_index(['date','ID'])
df = (df.join(df['time_spent']
.rename(lambda x: x pd.Timedelta(N, 'days'), level=0)
.rename('new'))
.reset_index()
)
print (df)
date ID time_spent new
0 2021-04-22 ABC 60 NaN
1 2021-04-22 PQR 55 NaN
2 2021-04-22 XYZ 50 NaN
3 2021-04-23 ABC 70 NaN
4 2021-04-23 XYZ 40 NaN
5 2021-04-23 XYZ 95 NaN
6 2021-04-24 ABC 15 60.0
7 2021-04-24 DEF 45 NaN
8 2021-04-24 PQR 65 55.0
Or:
N = 2
df['date'] = pd.to_datetime(df['date'])
df1 = (df.merge(df.assign(date = lambda x: x['date'] pd.Timedelta(N, 'days')),
on=['date','ID'],
how='left', suffixes=('','_'))
.rename(columns={'time_spent_':'new'}))
print (df1)
date ID time_spent new
0 2021-04-22 ABC 60 NaN
1 2021-04-22 XYZ 50 NaN
2 2021-04-22 PQR 55 NaN
3 2021-04-23 XYZ 40 NaN
4 2021-04-23 ABC 70 NaN
5 2021-04-23 XYZ 95 NaN
6 2021-04-24 ABC 15 60.0
7 2021-04-24 DEF 45 NaN
8 2021-04-24 PQR 65 55.0