Home > front end >  aggregate and distribute time series data
aggregate and distribute time series data

Time:11-07

I have some time series data in a pandas data frame like this:

begin end mw_values
2021-09-14 11:16:00 2021-09-14 11:27:11 0
2021-09-14 11:27:11 2021-09-14 11:30:00 100
2021-09-14 11:30:00 2021-09-14 11:33:59 1200
2021-09-14 11:33:59 2021-09-14 11:39:42 600
2021-09-14 11:39:42 2021-09-14 11:59:59 400

I need the sum of the mw_values distributed into 15 minutes time slots like this:

time_slots_15_min sum_mw_values
2021-09-14 11:00 0
2021-09-14 11:15 100
2021-09-14 11:30 2200
2021-09-14 11:45 0
2021-09-14 12:00 0

Does someone have any idea how I can achieve this?

Note that the intervals between begin and end may overlap 2 time slots. Then the value must be involved in the sum of the time slot where it begins; e.g. the mw_value of 400 in the example from above.

CodePudding user response:

You can reindex your DataFrame by the begin column, insert two new rows to ensure that the beginning time starts at 11:00 and that the end time is 12:00), then and then use .resample("15min").sum() which will work for a DatetimeIndex (the documentation can be found here if you want to read further):

## in case your column isn't already a datetime
df["begin"] = pd.to_datetime(df["begin"])

df = df.set_index("begin")

## add beginning and ending times to df
df_start_end = pd.DataFrame({"end": ["2021-09-14 11:15:00","2021-09-14 12:15:00"], "mw_values":[0]}, index=[pd.to_datetime("2021-09-14 11:00:00"),pd.to_datetime("2021-09-14 12:00:00")])
df_final = pd.concat([df_start_end,df]).sort_index()

This is what df_final looks like:

                                     end  mw_values
2021-09-14 11:00:00  2021-09-14 11:15:00          0
2021-09-14 11:16:00  2021-09-14 11:27:11          0
2021-09-14 11:27:11  2021-09-14 11:30:00        100
2021-09-14 11:30:00  2021-09-14 11:33:59       1200
2021-09-14 11:33:59  2021-09-14 11:39:42        600
2021-09-14 11:39:42  2021-09-14 11:59:59        400
2021-09-14 12:00:00  2021-09-14 12:15:00          0

Then we resample and sum every 15 minutes over the DatetimeIndex:

## sum by every 15 minutes from the start to end time
df_final.resample("15min").sum().reset_index().rename(columns={"index":"time_slots_15_min","mw_values":"sum_mw_values"})

Output:

    time_slots_15_min  sum_mw_values
0 2021-09-14 11:00:00              0
1 2021-09-14 11:15:00            100
2 2021-09-14 11:30:00           2200
3 2021-09-14 11:45:00              0
4 2021-09-14 12:00:00              0

CodePudding user response:

You can resample your dataframe so sum the data in 15 minute bins. Then you can reindex that frame so it matches your desired start/end/frequency times.

freq = "15min"
new_index = pd.date_range(
    "2021-09-14 11:00:00", "2021-09-14 12:00:00", freq=freq
)

out = (
    df.resample(freq, on="begin")["mw_values"]
    .sum()
    .reindex(new_index, fill_value=0)
    .to_frame("sum_mw_values")
)

print(out)
                     sum_mw_values
2021-09-14 11:00:00              0
2021-09-14 11:15:00            100
2021-09-14 11:30:00           2200
2021-09-14 11:45:00              0
2021-09-14 12:00:00              0
  • Related