I would like to change my period data to daily data. I have following data:
d = {'id_article': [1, 1, 2],
'sale': [103963.00, 30000.00, 15000.00],
'date_from': ['04-01-2022', '03-01-2022', '03-01-2022'],
'date_to': ['15-01-2022', '28-02-2022', '23-01-2022']}
df = pd.DataFrame(data=d)
id_article sale date_from date_to days_in_between
1 103963.00 04-01-2022 15-01-2022 12
1 30000.00 03-01-2022 28-02-2022 57
2 15000.00 03-01-2022 23-01-2022 21
In my final result I want to get total sum for every month. So something like that:
id_article sale month
1 119226,16 1
1 14736,84 2
2 15000 1
119226,16 = (103963/12)*12 (30000/57)*29
14736,84 = (30000/57)*28
(total_sale/days_in_between) * days_from_period_in_month
I figured I could change this df to daily records. I know that I could use pd.date_range(start=start_date, end=end_date, freq="D") to get days in between however I don't know how to apply it effectively to the dataframe. I would like to get following df:
id_article sale_daily date
1 8663,58 04-01-2022
1 8663,58 05-01-2022
1 8663,58 06-01-2022
1 8663,58 07-01-2022
1 8663,58 08-01-2022
1 8663,58 09-01-2022
1 8663,58 10-01-2022
1 8663,58 11-01-2022
1 8663,58 12-01-2022
1 8663,58 13-01-2022
1 8663,58 14-01-2022
1 8663,58 15-01-2022
1 526,32 03-01-2022
1 526,32 04-01-2022
1 526,32 05-01-2022
1 526,32 06-01-2022
1 526,32 07-01-2022
1 526,32 08-01-2022
... ... ...
1 526,32 24-02-2022
1 526,32 25-02-2022
1 526,32 26-02-2022
1 526,32 27-02-2022
1 526,32 28-02-2022
2 714,29 03-01-2022
2 ... ...
Where sale_daily is sale divided by number of days in between. Later I would extract month from the date and aggregate is by id_article and month. Could you help in getting the daily data or is there another way to obtain monthly sales from the given period?
CodePudding user response:
Try:
#convert if necessary:
#df["date_from"] = pd.to_datetime(df["date_from"], format="%d-%m-%Y")
#df["date_to"] = pd.to_datetime(df["date_to"], format="%d-%m-%Y")
df["month"] = df.apply(
lambda x: pd.date_range(x["date_from"], x["date_to"]),
axis=1,
)
df["sale"] = df.apply(lambda x: x["sale"] / x["month"].size, axis=1)
df = df.explode("month")
out = (
df.groupby([df["id_article"], df.month.dt.month])["sale"]
.sum()
.reset_index()
)
print(out)
Prints:
id_article month sale
0 1 1 119226.157895
1 1 2 14736.842105
2 2 1 15000.000000