Home > Mobile >  Dataframe add buisness days according to multiple features
Dataframe add buisness days according to multiple features

Time:12-09

I have a quite big data-frame with different products, locations, sold_value and the according date. I want to manipulate the DF in the way that each product for its according location gets dates values according to buisness day freq. As next step i want the sold_value if sold on same date (still according to product and its location) gets added to each other. So that basically each product with its according location has a buisness day freq with sold_value.

I tried multiple things already, at first i tied just to manipulate the DF with groupby.... as next i tried to make new dataframes for each product with its according location add the bday frq and .sum the sold_value according to date after that merging all the DFs but it wont quite work.

as a side fact there are hundreds of locations and products so i cant hard code their values

PS.: Thanks for Help

The DF:

product location date value
1 1 01-01-2022 1
1 1 04-01-2022 1
1 2 01-01-2022 5
2 1 01-01-2022 4
2 1 01-01-2022 3
2 2 01-01-2022 1

My wanted output:

product location date value
1 1 01-01-2022 1
1 1 02-01-2022 0
1 1 03-01-2022 0
1 1 04-01-2022 1
1 2 01-01-2022 5
2 1 01-01-2022 7
2 2 01-01-2022 1

CodePudding user response:

First aggregate sum for column value:

df['date'] = pd.to_datetime(df['date'], dayfirst=True)

df = df.groupby(['ct','location','date'], as_index=False)['value'].sum()
print (df)
   ct  location       date  value
0   1         1 2022-01-01      1
1   1         1 2022-01-04      1
2   1         2 2022-01-01      5
3   2         1 2022-01-01      7
4   2         2 2022-01-01      1

Then add 0 days per days in lambda function with Series.asfreq:

f = lambda x: x.asfreq('D', fill_value=0)
df1 = df.set_index('date').groupby(['ct','location'])['value'].apply(f).reset_index()
print (df1)
   ct  location       date  value
0   1         1 2022-01-01      1
1   1         1 2022-01-02      0
2   1         1 2022-01-03      0
3   1         1 2022-01-04      1
4   1         2 2022-01-01      5
5   2         1 2022-01-01      7
6   2         2 2022-01-01      1

If want use business days ouput is different:

f = lambda x: x.asfreq('B', fill_value=0)
df2 = df.set_index('date').groupby(['ct','location'])['value'].apply(f).reset_index()
print (df2)
   ct  location       date  value
0   1         1 2022-01-03      0
1   1         1 2022-01-04      1
  • Related