Home > other >  Pandas overlapped time intervals to time series
Pandas overlapped time intervals to time series

Time:10-19

I have a pandas dataframe that includes time intervals that overlapping at some points (figure 1). I need a data frame that has a time series that starts beginning from the first start_time to the end of the last end_time (figure 2). I have to sum up VIS values at overlapped time intervals. I couldn't figure it out. How can I do it?

enter image description here

enter image description here

CodePudding user response:

If you paste your data instead of the images, I'd be able to test this. But this is how you may want to think about it. Assume your dataframe is called df.

df['start_time'] = pd.to_datetime(df['start_time']) # in case it's not datetime already
df.set_index('start_time', inplace=True)
new_dates = pd.date_range(start=min(df.index), end=max(df.end_time), freq='15Min')
new_df = df.reindex(new_dates, fill_value=np.nan)

As long as there are no duplicates in start_time, this should work. If there is, that'd need to be handled in some other way.

Resample is another possibility, but without data, it's tough to say what would work.

CodePudding user response:

This problem is easily solved with the python package staircase, which is built on pandas and numpy for the purposes of working with (mathematical) step functions.

Assume your original dataframe is called df and the times you want in your resulting dataframe are an array (or datetime index, or series etc) called times.

import staircase as sc

stepfunction = sc.Stairs(df, start="start_time", end="end_time", value="VIS")
result = stepfunction(times, include_index=True)

That's it, result is a pandas Series indexed by times, and has the values you want. You can convert it to a dataframe in the format you want using reset_index method on the Series.

You can generate your times data like this

import pandas as pd

times = pd.date_range(df["start_time"].min(), df["end_time"].max(), freq="30min")

Why it works

Each row in your dataframe can be thought of a step function. For example the first row corresponds to a step function which starts with a value of zero, then at 2002-02-03 04:15:00 increases to a value of 10, then at 2002-02-04 04:45:00 returns to zero. When you sum all the step functions up for each row you have one step function whose value is the sum of all VIS values at any point. This is what has been assigned to the stepfunction variable above. The stepfunction variable is callable, and returns values of the step function at the points specified. This is what is happening in the last line of the example where the result variable is being assigned.

  • Related