I have a pandas dataframe that includes time intervals that overlapping at some points (figure 1). I need a data frame that has a time series that starts beginning from the first start_time to the end of the last end_time (figure 2). I have to sum up VIS values at overlapped time intervals. I couldn't figure it out. How can I do it?
CodePudding user response:
If you paste your data instead of the images, I'd be able to test this. But this is how you may want to think about it. Assume your dataframe is called df.
df['start_time'] = pd.to_datetime(df['start_time']) # in case it's not datetime already
df.set_index('start_time', inplace=True)
new_dates = pd.date_range(start=min(df.index), end=max(df.end_time), freq='15Min')
new_df = df.reindex(new_dates, fill_value=np.nan)
As long as there are no duplicates in start_time, this should work. If there is, that'd need to be handled in some other way.
Resample is another possibility, but without data, it's tough to say what would work.
CodePudding user response:
This problem is easily solved with the python package staircase, which is built on pandas and numpy for the purposes of working with (mathematical) step functions.
Assume your original dataframe is called df
and the times you want in your resulting dataframe are an array (or datetime index, or series etc) called times
.
import staircase as sc
stepfunction = sc.Stairs(df, start="start_time", end="end_time", value="VIS")
result = stepfunction(times, include_index=True)
That's it, result
is a pandas Series indexed by times, and has the values you want. You can convert it to a dataframe in the format you want using reset_index
method on the Series.
You can generate your times
data like this
import pandas as pd
times = pd.date_range(df["start_time"].min(), df["end_time"].max(), freq="30min")
Why it works
Each row in your dataframe can be thought of a step function. For example the first row corresponds to a step function which starts with a value of zero, then at 2002-02-03 04:15:00
increases to a value of 10, then at 2002-02-04 04:45:00
returns to zero. When you sum all the step functions up for each row you have one step function whose value is the sum of all VIS values at any point. This is what has been assigned to the stepfunction
variable above. The stepfunction
variable is callable, and returns values of the step function at the points specified. This is what is happening in the last line of the example where the result
variable is being assigned.