I have a quite big data-frame with different products, locations, sold_value and the according date. I want to manipulate the DF in the way that each product for its according location gets dates values according to buisness day freq. As next step i want the sold_value if sold on same date (still according to product and its location) gets added to each other. So that basically each product with its according location has a buisness day freq with sold_value.
I tried multiple things already, at first i tied just to manipulate the DF with groupby.... as next i tried to make new dataframes for each product with its according location add the bday frq and .sum the sold_value according to date after that merging all the DFs but it wont quite work.
as a side fact there are hundreds of locations and products so i cant hard code their values
PS.: Thanks for Help
The DF:
product | location | date | value |
---|---|---|---|
1 | 1 | 01-01-2022 | 1 |
1 | 1 | 04-01-2022 | 1 |
1 | 2 | 01-01-2022 | 5 |
2 | 1 | 01-01-2022 | 4 |
2 | 1 | 01-01-2022 | 3 |
2 | 2 | 01-01-2022 | 1 |
My wanted output:
product | location | date | value |
---|---|---|---|
1 | 1 | 01-01-2022 | 1 |
1 | 1 | 02-01-2022 | 0 |
1 | 1 | 03-01-2022 | 0 |
1 | 1 | 04-01-2022 | 1 |
1 | 2 | 01-01-2022 | 5 |
2 | 1 | 01-01-2022 | 7 |
2 | 2 | 01-01-2022 | 1 |
CodePudding user response:
First aggregate sum
for column value
:
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df = df.groupby(['ct','location','date'], as_index=False)['value'].sum()
print (df)
ct location date value
0 1 1 2022-01-01 1
1 1 1 2022-01-04 1
2 1 2 2022-01-01 5
3 2 1 2022-01-01 7
4 2 2 2022-01-01 1
Then add 0
days per days in lambda function with Series.asfreq
:
f = lambda x: x.asfreq('D', fill_value=0)
df1 = df.set_index('date').groupby(['ct','location'])['value'].apply(f).reset_index()
print (df1)
ct location date value
0 1 1 2022-01-01 1
1 1 1 2022-01-02 0
2 1 1 2022-01-03 0
3 1 1 2022-01-04 1
4 1 2 2022-01-01 5
5 2 1 2022-01-01 7
6 2 2 2022-01-01 1
If want use business days ouput is different:
f = lambda x: x.asfreq('B', fill_value=0)
df2 = df.set_index('date').groupby(['ct','location'])['value'].apply(f).reset_index()
print (df2)
ct location date value
0 1 1 2022-01-03 0
1 1 1 2022-01-04 1