I have a pandas dataframe with monthly date index till the current month. I would like to impute NA values n periods into the future (in my case 1 year). I tried adding future dates into the existing index in the following manner:
recentDate = inputFileDf.index[-1]
outputFileDf.index = outputFileDf.index.append(pd.date_range(recentDate , periods=12, freq="M"))
This throws ValueError: Length mismatch: Expected axis has 396 elements, new values have 408 elements
.
Would appreciate any help to "extend" the dataframe by adding the dates and NA values.
CodePudding user response:
You can use df.reindex
here.
Example data:
df = pd.DataFrame(
{'num': [*range(5)]},
index=pd.date_range('2022-10-10', periods=5, freq='D'))
print(df)
num
2022-10-10 0
2022-10-11 1
2022-10-12 2
2022-10-13 3
2022-10-14 4
recentDate = df.index[-1]
new_data = pd.date_range(recentDate , periods=4, freq="M")
new_idx = df.index.append(new_data)
new_df = df.reindex(new_idx)
print(new_df)
num
2022-10-10 0.0
2022-10-11 1.0
2022-10-12 2.0
2022-10-13 3.0
2022-10-14 4.0
2022-10-31 NaN
2022-11-30 NaN
2022-12-31 NaN
2023-01-31 NaN
CodePudding user response:
Use concat
- if indices are unique or not:
recentDate = inputFileDf.index[-1]
df = pd.DataFrame(index=pd.date_range(recentDate, periods=12, freq="M"))
outputFileDf = pd.concat([inputFileDf, df])
If indices are unique in idx
use DataFrame.reindex
:
recentDate = inputFileDf.index[-1]
idx = inputFileDf.index.append(pd.date_range(recentDate , periods=12, freq="M"))
outputFileDf = outputFileDf.reindex(idx)
EDIT: If original DataFrame has months and need append new one need add 1 month to last index for avoid double last indices from original DataFrame:
inputFileDf = pd.DataFrame(columns=['col'],
index=pd.date_range('2022-10-31', periods=4, freq='M'))
print(inputFileDf)
col
2022-10-31 NaN
2022-11-30 NaN
2022-12-31 NaN
2023-01-31 NaN
recentDate = inputFileDf.index[-1]
idx = inputFileDf.index.append(pd.date_range(recentDate pd.DateOffset(months=1) , periods=12, freq="M"))
outputFileDf = inputFileDf.reindex(idx)
print (outputFileDf)
col
2022-10-31 NaN
2022-11-30 NaN
2022-12-31 NaN
2023-01-31 NaN
2023-02-28 NaN
2023-03-31 NaN
2023-04-30 NaN
2023-05-31 NaN
2023-06-30 NaN
2023-07-31 NaN
2023-08-31 NaN
2023-09-30 NaN
2023-10-31 NaN
2023-11-30 NaN
2023-12-31 NaN
2024-01-31 NaN
Or use first index value:
firstDate = inputFileDf.index[0]
idx = pd.date_range(firstDate, periods=12 len(inputFileDf), freq="M")
outputFileDf = inputFileDf.reindex(idx)
print (outputFileDf)
col
2022-10-31 NaN
2022-11-30 NaN
2022-12-31 NaN
2023-01-31 NaN
2023-02-28 NaN
2023-03-31 NaN
2023-04-30 NaN
2023-05-31 NaN
2023-06-30 NaN
2023-07-31 NaN
2023-08-31 NaN
2023-09-30 NaN
2023-10-31 NaN
2023-11-30 NaN
2023-12-31 NaN
2024-01-31 NaN