I'm trying to transform a dataframe without but haven't achieved my desired output, would appreciate some help:
Input data:
date name value
2022-07-01 Anna 5
2022-07-01 Jim 3
2022-04-29 Anna 4
2022-04-29 Jim 2
2022-04-21 Anna 1
2021-07-29 Jim 5
2021-07-29 Anna 5
Desired output:
Name Last_30_days Last_365_days
Anna 5 15
Jim 3 10
My closest attempt :
df_new = (df.set_index('date')
.groupby('name', sort=False)
.resample('MS')['value'].sum()
.groupby(level=0)
.cumsum()
.reset_index(name='value'))
CodePudding user response:
Try this:
daydiff = (pd.Timestamp('now') - df['date']).dt.days
bins = [0, 30, 365, np.inf]
labels = [f'Last_{i}_days' for i in bins[1:-1]] ['Greater than 365 days']
df['period'] = pd.cut(daydiff, bins=bins, labels=labels)
df_out = df.pivot_table('value', 'name', 'period', aggfunc='sum')
print(df_out)
Output:
period Last_30_days Last_365_days Greater than 365 days
name
Anna 5 10 0
Jim 3 7 0
And, you can add a cumsum to get cumulative totals along the rows:
daydiff = (pd.Timestamp('now') - df['date']).dt.days
bins = [0,30,365,np.inf]
labels = [f'Last_{i}_days' for i in bins[1:-1]] ['Greater than 365 days']
df['period'] = pd.cut(daydiff, bins=bins, labels=labels)
df_out = df.pivot_table('value', 'name', 'period', aggfunc='sum')
df_out = df_out.cumsum(axis=1)
print(df_out)
Output:
period Last_30_days Last_365_days Greater than 365 days
name
Anna 5 15 15
Jim 3 10 10
CodePudding user response:
Calculate new binary column to check whether the day is included, then pivot, then concat all selected day differences.
days_select = [30, 60, 365]
pd.concat([df.assign(in_last_n_days=(pd.Timestamp.today() - df.date) < pd.Timedelta(f"{ndays}d"))
.query("in_last_n_days")
.pivot_table(index="name", values="value", aggfunc=sum)
.rename(columns={"value": f"last_{ndays}_days"})
for ndays in days_select], axis=1)
last_30_days last_60_days last_365_days
name
Anna 5 5 15
Jim 3 3 10
CodePudding user response:
df_pivot = pd.pivot_table(df, values='value',index='name',columns='date')
df_pivot.fillna(0,inplace=True)
expected_df = pd.DataFrame(df_pivot['2021-07-29'] df_pivot['2022-04-21'] df_pivot['2022-04-29'] df_pivot['2022-07-01'])
expected_df['last_365_days'] = df_pivot['2021-07-29'] df_pivot['2022-04-21'] df_pivot['2022-04-29'] df_pivot['2022-07-01']
del expected_df[0]
expected_df['last_30_days'] = df_pivot['2022-07-01']