I have a multi-index dataframe of timeseries data which looks like the following;
A B C
1 1 21 32 4
2 4 2 23
3 12 9 10
4 1 56 37
.
.
.
.
30 63 1 27
31 32 2 32
.
.
.
12 1 2 3 23
2 23 1 12
3 32 3 23
.
.
.
31 23 2 32
It is essentially a multi-index of month and dates with three columns.
I need to turn this into daily data and essentially have a dataframe whereby there is a single index where value in the above dataframe responds to its' respective date over 10 years.
For exmaple;
Desired output;
A B C
01/01/2017 21 32 4
.
.
31/12/2017 23 2 32
.
.
01/01/2022 21 32 4
.
.
31/12/2022 23 2 32
I hope this is clear! Its essentially turning daily/monthly data into daily/monthly/yearly data.
Any help much appreciated!
CodePudding user response:
You can use:
df.index = pd.to_datetime(df.index.rename(['month', 'day']).to_frame().assign(year=2022))
Output:
A B C
2022-01-01 21 32 4
2022-01-02 4 2 23
2022-01-03 12 9 10
2022-01-04 1 56 37
2022-01-30 63 1 27
2022-01-31 32 2 32
2022-12-01 2 3 23
2022-12-02 23 1 12
2022-12-03 32 3 23
2022-12-31 23 2 32
spanning several years
There is no absolute fool proof way to handle years if those are missing. What we can do it to infer the year change when a date goes back in the past and add 1 year in this case:
# let's assume the starting year is 2017
date = pd.to_datetime(df.index.rename(['month', 'day']).to_frame().assign(year=2017))
df.index = date date.diff().lt('0').cumsum().mul(pd.DateOffset(years=1))
output:
A B C
2017-01-01 21 32 4
2017-01-02 4 2 23
2017-06-03 12 9 10
2017-06-04 1 56 37
2018-01-30 63 1 27 # added 1 year
2018-01-31 32 2 32
2018-12-01 2 3 23
2018-12-02 23 1 12
2018-12-03 32 3 23
2018-12-31 23 2 32
used input:
A B C
1 1 21 32 4
2 4 2 23
6 3 12 9 10
4 1 56 37
1 30 63 1 27 # here we go back from month 1 after month 6
31 32 2 32
12 1 2 3 23
2 23 1 12
3 32 3 23
31 23 2 32