aggregate and distribute time series data-CodePudding

I have some time series data in a pandas data frame like this:

begin	end	mw_values
2021-09-14 11:16:00	2021-09-14 11:27:11	0
2021-09-14 11:27:11	2021-09-14 11:30:00	100
2021-09-14 11:30:00	2021-09-14 11:33:59	1200
2021-09-14 11:33:59	2021-09-14 11:39:42	600
2021-09-14 11:39:42	2021-09-14 11:59:59	400

I need the sum of the mw_values distributed into 15 minutes time slots like this:

time_slots_15_min	sum_mw_values
2021-09-14 11:00	0
2021-09-14 11:15	100
2021-09-14 11:30	2200
2021-09-14 11:45	0
2021-09-14 12:00	0

Does someone have any idea how I can achieve this?

Note that the intervals between begin and end may overlap 2 time slots. Then the value must be involved in the sum of the time slot where it begins; e.g. the mw_value of 400 in the example from above.

CodePudding user response：

You can reindex your DataFrame by the begin column, insert two new rows to ensure that the beginning time starts at 11:00 and that the end time is 12:00), then and then use .resample("15min").sum() which will work for a DatetimeIndex (the documentation can be found here if you want to read further):

## in case your column isn't already a datetime
df["begin"] = pd.to_datetime(df["begin"])

df = df.set_index("begin")

## add beginning and ending times to df
df_start_end = pd.DataFrame({"end": ["2021-09-14 11:15:00","2021-09-14 12:15:00"], "mw_values":[0]}, index=[pd.to_datetime("2021-09-14 11:00:00"),pd.to_datetime("2021-09-14 12:00:00")])
df_final = pd.concat([df_start_end,df]).sort_index()

This is what df_final looks like:

                                     end  mw_values
2021-09-14 11:00:00  2021-09-14 11:15:00          0
2021-09-14 11:16:00  2021-09-14 11:27:11          0
2021-09-14 11:27:11  2021-09-14 11:30:00        100
2021-09-14 11:30:00  2021-09-14 11:33:59       1200
2021-09-14 11:33:59  2021-09-14 11:39:42        600
2021-09-14 11:39:42  2021-09-14 11:59:59        400
2021-09-14 12:00:00  2021-09-14 12:15:00          0

Then we resample and sum every 15 minutes over the DatetimeIndex:

## sum by every 15 minutes from the start to end time
df_final.resample("15min").sum().reset_index().rename(columns={"index":"time_slots_15_min","mw_values":"sum_mw_values"})

Output:

    time_slots_15_min  sum_mw_values
0 2021-09-14 11:00:00              0
1 2021-09-14 11:15:00            100
2 2021-09-14 11:30:00           2200
3 2021-09-14 11:45:00              0
4 2021-09-14 12:00:00              0

CodePudding user response：

You can resample your dataframe so sum the data in 15 minute bins. Then you can reindex that frame so it matches your desired start/end/frequency times.

freq = "15min"
new_index = pd.date_range(
    "2021-09-14 11:00:00", "2021-09-14 12:00:00", freq=freq
)

out = (
    df.resample(freq, on="begin")["mw_values"]
    .sum()
    .reindex(new_index, fill_value=0)
    .to_frame("sum_mw_values")
)

print(out)
                     sum_mw_values
2021-09-14 11:00:00              0
2021-09-14 11:15:00            100
2021-09-14 11:30:00           2200
2021-09-14 11:45:00              0
2021-09-14 12:00:00              0