Home > Enterprise >  Python: change periods to daily records
Python: change periods to daily records

Time:04-07

I would like to change my period data to daily data. I have following data:

d = {'id_article': [1, 1, 2],
     'sale': [103963.00, 30000.00, 15000.00],
     'date_from': ['04-01-2022', '03-01-2022', '03-01-2022'],
     'date_to': ['15-01-2022', '28-02-2022', '23-01-2022']}

df = pd.DataFrame(data=d)

id_article  sale        date_from       date_to       days_in_between
1           103963.00   04-01-2022      15-01-2022    12
1           30000.00    03-01-2022      28-02-2022    57
2           15000.00    03-01-2022      23-01-2022    21

In my final result I want to get total sum for every month. So something like that:

id_article  sale          month     
1           119226,16     1
1           14736,84      2
2           15000         1

119226,16 = (103963/12)*12   (30000/57)*29
14736,84 = (30000/57)*28
(total_sale/days_in_between) * days_from_period_in_month

I figured I could change this df to daily records. I know that I could use pd.date_range(start=start_date, end=end_date, freq="D") to get days in between however I don't know how to apply it effectively to the dataframe. I would like to get following df:

id_article  sale_daily    date      
1           8663,58       04-01-2022
1           8663,58       05-01-2022
1           8663,58       06-01-2022
1           8663,58       07-01-2022
1           8663,58       08-01-2022
1           8663,58       09-01-2022
1           8663,58       10-01-2022
1           8663,58       11-01-2022
1           8663,58       12-01-2022
1           8663,58       13-01-2022
1           8663,58       14-01-2022
1           8663,58       15-01-2022
1           526,32        03-01-2022
1           526,32        04-01-2022
1           526,32        05-01-2022
1           526,32        06-01-2022
1           526,32        07-01-2022
1           526,32        08-01-2022
...         ...           ...      
1           526,32        24-02-2022
1           526,32        25-02-2022
1           526,32        26-02-2022
1           526,32        27-02-2022
1           526,32        28-02-2022
2           714,29        03-01-2022
2           ...           ...

Where sale_daily is sale divided by number of days in between. Later I would extract month from the date and aggregate is by id_article and month. Could you help in getting the daily data or is there another way to obtain monthly sales from the given period?

CodePudding user response:

Try:

#convert if necessary:
#df["date_from"] = pd.to_datetime(df["date_from"], format="%d-%m-%Y")
#df["date_to"] = pd.to_datetime(df["date_to"], format="%d-%m-%Y")

df["month"] = df.apply(
    lambda x: pd.date_range(x["date_from"], x["date_to"]),
    axis=1,
)
df["sale"] = df.apply(lambda x: x["sale"] / x["month"].size, axis=1)

df = df.explode("month")
out = (
    df.groupby([df["id_article"], df.month.dt.month])["sale"]
    .sum()
    .reset_index()
)
print(out)

Prints:

   id_article  month           sale
0           1      1  119226.157895
1           1      2   14736.842105
2           2      1   15000.000000
  • Related