I have a dataframe which is monthly averages which looks like the following;
A B C D E
1 3 21 3 22 3
2 4 32 3 24 0
3 5 1 12 3 12
.
.
11 5 4 9 85 85 3
12 43 4 48 3 84 4
I'm looking to convert this data to a daily timeframe so that the dataframe would be a ten year timeseries and each value would correspond to its' monthly value. For example;
A B C D E
01/01/2010 3 21 3 22 3
02/01/2010 3 21 3 22 3
.
.
31/01/2010 3 21 3 22 3
.
.
.
30/12/2020 43 4 48 84 4
31/12/2020 43 4 48 84 4
Any help much appreciated!
Thanks
CodePudding user response:
You can reindex
:
idx = pd.date_range('2010-01-01', '2020-12-30', freq='D')
out = df.reindex(idx.month).set_axis(idx)
Output:
A B C D E
2010-01-01 3.0 21.0 3.0 22.0 3.0
2010-01-02 3.0 21.0 3.0 22.0 3.0
2010-01-03 3.0 21.0 3.0 22.0 3.0
2010-01-04 3.0 21.0 3.0 22.0 3.0
2010-01-05 3.0 21.0 3.0 22.0 3.0
... ... ... ... ... ...
2020-12-26 43.0 4.0 48.0 84.0 4.0
2020-12-27 43.0 4.0 48.0 84.0 4.0
2020-12-28 43.0 4.0 48.0 84.0 4.0
2020-12-29 43.0 4.0 48.0 84.0 4.0
2020-12-30 43.0 4.0 48.0 84.0 4.0
[4017 rows x 5 columns]
CodePudding user response:
I would make two new indexes. First, repeat your data for the number of years that you want, then replace it with a proper datetime index. Finally, reindex by a new index spanning the same time interval with a day frequency and fill the nans:
import numpy as np
import pandas as pd
# Some example data
df = pd.DataFrame(np.random.randint(10, size=(12, 5)), index=range(1, 13), columns=list("ABCDE"))
idx1 = pd.date_range("2010", "2021", freq="M")
idx2 = pd.date_range("2010-01-01", "2020-12-31") # Implicit day frequency
out = (
pd.concat([df] * 11) # Repeat for 11 years (2010-2020 inclusive)
.set_index(idx1) # Overwrite with actual dates
.reindex(idx2) # Add all missing days
.bfill() # Backfill nans introduced for missing days
)