Extrapolating time series data into the future by repeating/scaling existing values-CodePudding

I have hourly data on electricity consumption for a specific day. I would like to use this data to "predict" the hourly electricity consumption for the following days. The value for the following day should be the value from the same hour the day before, multiplied by a scaling factor f (e.g. 2).

The dataframe df that I have looks something like this:

                       load_kWh
2021-01-01 00:00:00   1.0
2021-01-01 01:00:00   1.0
2021-01-01 02:00:00   1.0
2021-01-01 03:00:00   1.0
2021-01-01 04:00:00   1.0
2021-01-01 05:00:00   1.0
2021-01-01 06:00:00   1.0
2021-01-01 07:00:00   3.0
2021-01-01 08:00:00   3.0
2021-01-01 09:00:00   3.0
2021-01-01 10:00:00   3.0
2021-01-01 11:00:00   3.0
2021-01-01 12:00:00   3.0
2021-01-01 13:00:00   3.0
2021-01-01 14:00:00   3.0
2021-01-01 15:00:00   3.0
2021-01-01 16:00:00   3.0
2021-01-01 17:00:00   3.0
2021-01-01 18:00:00   3.0
2021-01-01 19:00:00   3.0
2021-01-01 20:00:00   1.0
2021-01-01 21:00:00   1.0
2021-01-01 22:00:00   1.0
2021-01-01 23:00:00   1.0

And I would like the output dataframe df_ex to look something like this:

                       load_kWh
2021-01-01 00:00:00   1.0
2021-01-01 01:00:00   1.0
2021-01-01 02:00:00   1.0
2021-01-01 03:00:00   1.0
2021-01-01 04:00:00   1.0
2021-01-01 05:00:00   1.0
2021-01-01 06:00:00   1.0
2021-01-01 07:00:00   3.0
2021-01-01 08:00:00   3.0
2021-01-01 09:00:00   3.0
2021-01-01 10:00:00   3.0
2021-01-01 11:00:00   3.0
2021-01-01 12:00:00   3.0
2021-01-01 13:00:00   3.0
2021-01-01 14:00:00   3.0
2021-01-01 15:00:00   3.0
2021-01-01 16:00:00   3.0
2021-01-01 17:00:00   3.0
2021-01-01 18:00:00   3.0
2021-01-01 19:00:00   3.0
2021-01-01 20:00:00   1.0
2021-01-01 21:00:00   1.0
2021-01-01 22:00:00   1.0
2021-01-01 23:00:00   1.0
2021-01-02 00:00:00   2.0
2021-01-02 01:00:00   2.0
2021-01-02 02:00:00   2.0
2021-01-02 03:00:00   2.0
2021-01-02 04:00:00   2.0
2021-01-02 05:00:00   2.0
2021-01-02 06:00:00   2.0
2021-01-02 07:00:00   6.0
2021-01-02 08:00:00   6.0
2021-01-02 09:00:00   6.0
2021-01-02 10:00:00   6.0
2021-01-02 11:00:00   6.0
2021-01-02 12:00:00   6.0
2021-01-02 13:00:00   6.0
2021-01-02 14:00:00   6.0
2021-01-02 15:00:00   6.0
2021-01-02 16:00:00   6.0
2021-01-02 17:00:00   6.0
2021-01-02 18:00:00   6.0
2021-01-02 19:00:00   6.0
2021-01-02 20:00:00   2.0
2021-01-02 21:00:00   2.0
2021-01-02 22:00:00   2.0
2021-01-02 23:00:00   2.0
2021-01-03 00:00:00   4.0
2021-01-03 01:00:00   4.0
2021-01-03 02:00:00   4.0
2021-01-03 03:00:00   4.0
2021-01-03 04:00:00   4.0
2021-01-03 05:00:00   4.0
2021-01-03 06:00:00   4.0
2021-01-03 07:00:00   12.0
2021-01-03 08:00:00   12.0
2021-01-03 09:00:00   12.0
2021-01-03 10:00:00   12.0
2021-01-03 11:00:00   12.0
2021-01-03 12:00:00   12.0
2021-01-03 13:00:00   12.0
2021-01-03 14:00:00   12.0
2021-01-03 15:00:00   12.0
2021-01-03 16:00:00   4.0
2021-01-03 17:00:00   4.0
2021-01-03 18:00:00   4.0
2021-01-03 19:00:00   4.0
2021-01-03 20:00:00   4.0
2021-01-03 21:00:00   4.0
2021-01-03 22:00:00   4.0
2021-01-03 23:00:00   4.0

I have tried the following solution (df as defined above):

import pandas as pd
import datetime

start = '2021-01-01 00:00'
end = '2021-01-03 23:00'
freq = 'H'

index = pd.date_range(start,
                      end,
                      freq=freq)

df_ex = df.reindex(index)

i = df_ex.index[0].day
f = 2.0
df_ex.loc[df_ex.index.day == i 1] = df_ex.loc[df_ex.index.day == i] * f

print(df_ex)

Which results in:

                      load_kWh
2021-01-01 00:00:00   1.0
2021-01-01 01:00:00   1.0
2021-01-01 02:00:00   1.0
2021-01-01 03:00:00   1.0
2021-01-01 04:00:00   1.0
...                   ...
2021-01-03 19:00:00   NaN
2021-01-03 20:00:00   NaN
2021-01-03 21:00:00   NaN
2021-01-03 22:00:00   NaN
2021-01-03 23:00:00   NaN

It appears that my attempt to populate the rows after the first day with values did not succeed. The index is a DateTimeIndex.

Any suggestions on how to solve this would be greatly appreciated!

CodePudding user response：

To create the data you need to iterate one day at a time.

Assuming that the original data has at least a full day of data then you can do:

import pandas as pd
import itertools
import datetime as dt

start = "2021-01-01 00:00"
end = "2021-01-01 23:00"
freq = "H"

df = pd.DataFrame(
    {"load_kWh": itertools.chain([1.0] * 7, [3.0] * 13, [1.0] * 4)},
    index=pd.date_range(start, end, freq=freq),
)


def add_days_to_df(data: pd.DataFrame, number_of_days: int, k: float) -> pd.DataFrame:
    data = data.copy()
    for _ in range(number_of_days):
        day = data[-24:]
        day.index  = dt.timedelta(days=1)
        day *= k
        data = pd.concat((data, day))
    return data


print(add_days_to_df(data=df, number_of_days=2, k=2.0))