I am trying to reshape a dataframe at monthly level without much success. I have a dataframe that contains data that spans a given period: either monthly, quarterly or yearly. Essentially I would like to reshape the dataframe as follows: once all the data available at monthly level is exhausted, use the quarterly values, then once all quarterly values are exhausted, use the yearly values. Do you have any idea how I could do that?
Many thanks for your help!
Inputs:
var_name begin_delivery_date end_delivery_date value
Monthly 2022 2022-01-01T06:00:00 2022-02-01T05:59:59 5
Monthly 2022 2022-02-01T06:00:00 2022-03-01T05:59:59 7
... ... ... ...
Quarterly 2022 2022-01-01T06:00:00 2022-04-01T06:00:00 10
... ... ... ...
Yearly 2022 2022-01-01T06:00:00 2023-01-01T06:00:00 49
expected output:
date var_name value
2022-01-01 Monthly 2022 5
2022-02-01 Monthly 2022 7
2022-03-01 Quarterly 2022 10
2022-04-01 Yearly 2022 49
2022-05-01 Yearly 2022 49
2022-06-01 Yearly 2022 49
2022-07-01 Yearly 2022 49
2022-08-01 Yearly 2022 49
2022-09-01 Yearly 2022 49
2022-10-01 Yearly 2022 49
2022-11-01 Yearly 2022 49
2022-12-01 Yearly 2022 49
input data to play with:
{ {
"begin_delivery_date": "2022-01-01T06:00:00",
"var name": "Monthly 2022",
"end_delivery_date": "2022-02-01T05:59:59",
"value": 5
},
{
"begin_delivery_date": "2022-02-01T06:00:00",
"var name": "Monthly 2022",
"end_delivery_date": "2022-03-01T05:59:59",
"value": 7
},
{
"begin_delivery_date": "2022-03-01T06:00:00",
"var name": "Monthly 2022",
"end_delivery_date": "2022-04-01T05:59:59",
"value": 8
},
{
"begin_delivery_date": "2022-04-01T06:00:00",
"var name": "Monthly 2022",
"end_delivery_date": "2022-05-01T05:59:59",
"value": 9
},
{
"begin_delivery_date": "2022-04-01T06:00:00",
"var name": "Quarterly 2022",
"end_delivery_date": "2022-07-01T05:59:59",
"value": 10
},
{
"begin_delivery_date": "2022-07-01T06:00:00",
"var name": "Quarterly 2022",
"end_delivery_date": "2022-10-01T05:59:59",
"value": 11
},
{
"begin_delivery_date": "2022-09-01T06:00:00",
"var name": "Quarterly 2022",
"end_delivery_date": "2023-01-01T05:59:59",
"value": 12
},
{
"begin_delivery_date": "2023-01-01T06:00:00",
"var name": "Yearly 2023",
"end_delivery_date": "2024-01-01T05:59:59",
"value": 50
},
{
"begin_delivery_date": "2024-01-01T06:00:00",
"var name": "Yearly 2024",
"end_delivery_date": "2025-01-01T05:59:59",
"value": 60
}
}
CodePudding user response:
IIUC,
import pandas as pd
import numpy as np
data = [ {
"begin_delivery_date": "2022-01-01T06:00:00",
"var name": "Monthly 2022",
"end_delivery_date": "2022-02-01T05:59:59",
"value": 5
},
{
"begin_delivery_date": "2022-02-01T06:00:00",
"var name": "Monthly 2022",
"end_delivery_date": "2022-03-01T05:59:59",
"value": 7
},
{
"begin_delivery_date": "2022-03-01T06:00:00",
"var name": "Monthly 2022",
"end_delivery_date": "2022-04-01T05:59:59",
"value": 8
},
{
"begin_delivery_date": "2022-04-01T06:00:00",
"var name": "Monthly 2022",
"end_delivery_date": "2022-05-01T05:59:59",
"value": 9
},
{
"begin_delivery_date": "2022-04-01T06:00:00",
"var name": "Quarterly 2022",
"end_delivery_date": "2022-07-01T05:59:59",
"value": 10
},
{
"begin_delivery_date": "2022-07-01T06:00:00",
"var name": "Quarterly 2022",
"end_delivery_date": "2022-10-01T05:59:59",
"value": 11
},
{
"begin_delivery_date": "2022-09-01T06:00:00",
"var name": "Quarterly 2022",
"end_delivery_date": "2023-01-01T05:59:59",
"value": 12
},
{
"begin_delivery_date": "2023-01-01T06:00:00",
"var name": "Yearly 2023",
"end_delivery_date": "2024-01-01T05:59:59",
"value": 50
},
{
"begin_delivery_date": "2024-01-01T06:00:00",
"var name": "Yearly 2024",
"end_delivery_date": "2025-01-01T05:59:59",
"value": 60
}
]
df = pd.DataFrame(data)
Create list of dates from date range and explode dataframe.
df['dates'] = [pd.date_range(s, e, freq='M') for s, e in zip(df['begin_delivery_date'], df['end_delivery_date'])]
df_out = df.explode('dates')
print(df_out)
Output:
begin_delivery_date var name end_delivery_date value dates
0 2022-01-01T06:00:00 Monthly 2022 2022-02-01T05:59:59 5 2022-01-31 06:00:00
1 2022-02-01T06:00:00 Monthly 2022 2022-03-01T05:59:59 7 2022-02-28 06:00:00
2 2022-03-01T06:00:00 Monthly 2022 2022-04-01T05:59:59 8 2022-03-31 06:00:00
3 2022-04-01T06:00:00 Monthly 2022 2022-05-01T05:59:59 9 2022-04-30 06:00:00
4 2022-04-01T06:00:00 Quarterly 2022 2022-07-01T05:59:59 10 2022-04-30 06:00:00
4 2022-04-01T06:00:00 Quarterly 2022 2022-07-01T05:59:59 10 2022-05-31 06:00:00
4 2022-04-01T06:00:00 Quarterly 2022 2022-07-01T05:59:59 10 2022-06-30 06:00:00
5 2022-07-01T06:00:00 Quarterly 2022 2022-10-01T05:59:59 11 2022-07-31 06:00:00
5 2022-07-01T06:00:00 Quarterly 2022 2022-10-01T05:59:59 11 2022-08-31 06:00:00
5 2022-07-01T06:00:00 Quarterly 2022 2022-10-01T05:59:59 11 2022-09-30 06:00:00
6 2022-09-01T06:00:00 Quarterly 2022 2023-01-01T05:59:59 12 2022-09-30 06:00:00
6 2022-09-01T06:00:00 Quarterly 2022 2023-01-01T05:59:59 12 2022-10-31 06:00:00
6 2022-09-01T06:00:00 Quarterly 2022 2023-01-01T05:59:59 12 2022-11-30 06:00:00
6 2022-09-01T06:00:00 Quarterly 2022 2023-01-01T05:59:59 12 2022-12-31 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-01-31 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-02-28 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-03-31 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-04-30 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-05-31 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-06-30 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-07-31 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-08-31 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-09-30 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-10-31 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-11-30 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-12-31 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-01-31 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-02-29 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-03-31 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-04-30 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-05-31 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-06-30 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-07-31 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-08-31 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-09-30 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-10-31 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-11-30 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-12-31 06:00:00
CodePudding user response:
Create a df and shuffle it(data are the data you wrote above)
df = pd.DataFrame(data)
df = df.sample(frac=1).reset_index(drop=True)
Split each value from var name into 2 separate columns,var_name_pediod and var_name_year
df["var_name_pediod"] = df["var name"].str.split(" ").str[0]
df["var_name_year"] = df["var name"].str.split(" ").str[1]
Create a dictionary for sorting the periods and replace the "var_name_pediod" column with the dictionary
sort_dic = {"Monthly":1,"Quarterly":2,"Yearly":3}
df["var_name_pediod"] = df["var_name_pediod"].replace(sort_dic)
Sort values by "var_name_pediod" column
df.sort_values(by=['var_name_pediod'], inplace=True)
Groupby var_name_pediod and sort by "var_name_year
df.groupby(['var_name_pediod']).apply(lambda x: x.sort_values(by=['var_name_year'])).reset_index(drop=True)
Done. Drop extra columns if you don't need them
df.drop(columns=["var_name_pediod","var_name_year"],inplace=True)