Home > Back-end >  Adding NA data for future dates in pandas dataframe
Adding NA data for future dates in pandas dataframe

Time:10-27

I have a pandas dataframe with monthly date index till the current month. I would like to impute NA values n periods into the future (in my case 1 year). I tried adding future dates into the existing index in the following manner:

recentDate = inputFileDf.index[-1]
outputFileDf.index = outputFileDf.index.append(pd.date_range(recentDate , periods=12, freq="M"))

This throws ValueError: Length mismatch: Expected axis has 396 elements, new values have 408 elements.

Would appreciate any help to "extend" the dataframe by adding the dates and NA values.

CodePudding user response:

You can use df.reindex here.

Example data:

df = pd.DataFrame(
    {'num': [*range(5)]},
    index=pd.date_range('2022-10-10', periods=5, freq='D'))
print(df)
            num
2022-10-10    0
2022-10-11    1
2022-10-12    2
2022-10-13    3
2022-10-14    4
recentDate = df.index[-1]
new_data = pd.date_range(recentDate , periods=4, freq="M")
new_idx = df.index.append(new_data)

new_df = df.reindex(new_idx)
print(new_df)
            num
2022-10-10  0.0
2022-10-11  1.0
2022-10-12  2.0
2022-10-13  3.0
2022-10-14  4.0
2022-10-31  NaN
2022-11-30  NaN
2022-12-31  NaN
2023-01-31  NaN

CodePudding user response:

Use concat - if indices are unique or not:

recentDate = inputFileDf.index[-1]
df = pd.DataFrame(index=pd.date_range(recentDate, periods=12, freq="M"))
outputFileDf = pd.concat([inputFileDf, df])

If indices are unique in idx use DataFrame.reindex:

recentDate = inputFileDf.index[-1]
idx = inputFileDf.index.append(pd.date_range(recentDate , periods=12, freq="M"))

outputFileDf = outputFileDf.reindex(idx)

EDIT: If original DataFrame has months and need append new one need add 1 month to last index for avoid double last indices from original DataFrame:

inputFileDf = pd.DataFrame(columns=['col'], 
                           index=pd.date_range('2022-10-31', periods=4, freq='M'))
print(inputFileDf)
            col
2022-10-31  NaN
2022-11-30  NaN
2022-12-31  NaN
2023-01-31  NaN

recentDate = inputFileDf.index[-1]

idx = inputFileDf.index.append(pd.date_range(recentDate   pd.DateOffset(months=1) , periods=12, freq="M"))

outputFileDf = inputFileDf.reindex(idx)
print (outputFileDf)
            col
2022-10-31  NaN
2022-11-30  NaN
2022-12-31  NaN
2023-01-31  NaN
2023-02-28  NaN
2023-03-31  NaN
2023-04-30  NaN
2023-05-31  NaN
2023-06-30  NaN
2023-07-31  NaN
2023-08-31  NaN
2023-09-30  NaN
2023-10-31  NaN
2023-11-30  NaN
2023-12-31  NaN
2024-01-31  NaN

Or use first index value:

firstDate = inputFileDf.index[0]

idx = pd.date_range(firstDate, periods=12   len(inputFileDf), freq="M")

outputFileDf = inputFileDf.reindex(idx)
print (outputFileDf)
            col
2022-10-31  NaN
2022-11-30  NaN
2022-12-31  NaN
2023-01-31  NaN
2023-02-28  NaN
2023-03-31  NaN
2023-04-30  NaN
2023-05-31  NaN
2023-06-30  NaN
2023-07-31  NaN
2023-08-31  NaN
2023-09-30  NaN
2023-10-31  NaN
2023-11-30  NaN
2023-12-31  NaN
2024-01-31  NaN
  • Related