How to calculate Minus using groupby and by time series?-CodePudding

I have a df like this:

lst_1 = ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B']
lst_2 = [500, 600, 800, 900,700, 800,1000, 1200]
lst_3 = ['10/31/2022', '11/02/2022','11/07/2022', '11/14/2022', '10/31/2022', '11/02/2022','11/07/2022', '11/14/2022']
df1 = pd.DataFrame(list(zip(lst_1 , lst_2, lst_3)),
              columns =['SKU', 'Sum_Qty_Sold', 'Date_Updated'])

The df I expect:

lst_1 = ['A', 'A', 'B', 'B']
lst_2 = [300, 100, 300, 200]
lst_3 = ['10/31/2022-11/07/2022', '11/07/2022-11/14/2022', '10/31/2022-11/07/2022', '11/07/2022-11/14/2022']

result = pd.DataFrame(list(zip(lst_1 , lst_2, lst_3)),
              columns =['SKU', 'Qty_Sold_By_Week', 'Time_Series'])

How can calculate the 'Qty_Sold_By_Week' for a period of 7 days ? 'Qty_Sold_By_Week' = -('Sum_Qty_Sold' of 'Date_Updated' in df - 'Sum_Qty_Sold' after 7 days (if have))

CodePudding user response：

grouper = pd.PeriodIndex(df1['Date_Updated'], freq='w').to_timestamp().strftime('%m/%d/%Y')
df = (df1.groupby(['SKU', grouper])['Sum_Qty_Sold']
      .first().reset_index().sort_values('SKU').iloc[:, [0, -1, 1]])
df['Sum_Qty_Sold'] = df.groupby('SKU')['Sum_Qty_Sold'].shift(-1) - df['Sum_Qty_Sold']
df['Date_Updated'] = df['Date_Updated']   '-'    df.groupby('SKU')['Date_Updated'].shift(-1)
result = df.dropna().rename(columns={'Sum_Qty_Sold':'Qty_Sold_By_Week', 'Date_Updated':'Time_Series'})

output(result):

    SKU Qty_Sold_By_Week    Time_Series
0   A   300.0               10/31/2022-11/07/2022
1   A   100.0               11/07/2022-11/14/2022
3   B   300.0               10/31/2022-11/07/2022
4   B   200.0               11/07/2022-11/14/2022

Next time, don't make the example's column name like this. too long name don need to solve problem.