I have a df like this:
lst_1 = ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B']
lst_2 = [500, 600, 800, 900,700, 800,1000, 1200]
lst_3 = ['10/31/2022', '11/02/2022','11/07/2022', '11/14/2022', '10/31/2022', '11/02/2022','11/07/2022', '11/14/2022']
df1 = pd.DataFrame(list(zip(lst_1 , lst_2, lst_3)),
columns =['SKU', 'Sum_Qty_Sold', 'Date_Updated'])
The df I expect:
lst_1 = ['A', 'A', 'B', 'B']
lst_2 = [300, 100, 300, 200]
lst_3 = ['10/31/2022-11/07/2022', '11/07/2022-11/14/2022', '10/31/2022-11/07/2022', '11/07/2022-11/14/2022']
result = pd.DataFrame(list(zip(lst_1 , lst_2, lst_3)),
columns =['SKU', 'Qty_Sold_By_Week', 'Time_Series'])
How can calculate the 'Qty_Sold_By_Week' for a period of 7 days ? 'Qty_Sold_By_Week' = -('Sum_Qty_Sold' of 'Date_Updated' in df - 'Sum_Qty_Sold' after 7 days (if have))
CodePudding user response:
grouper = pd.PeriodIndex(df1['Date_Updated'], freq='w').to_timestamp().strftime('%m/%d/%Y')
df = (df1.groupby(['SKU', grouper])['Sum_Qty_Sold']
.first().reset_index().sort_values('SKU').iloc[:, [0, -1, 1]])
df['Sum_Qty_Sold'] = df.groupby('SKU')['Sum_Qty_Sold'].shift(-1) - df['Sum_Qty_Sold']
df['Date_Updated'] = df['Date_Updated'] '-' df.groupby('SKU')['Date_Updated'].shift(-1)
result = df.dropna().rename(columns={'Sum_Qty_Sold':'Qty_Sold_By_Week', 'Date_Updated':'Time_Series'})
output(result
):
SKU Qty_Sold_By_Week Time_Series
0 A 300.0 10/31/2022-11/07/2022
1 A 100.0 11/07/2022-11/14/2022
3 B 300.0 10/31/2022-11/07/2022
4 B 200.0 11/07/2022-11/14/2022
Next time, don't make the example's column name like this. too long name don need to solve problem.