Home > database >  Create IntervalIndex from periodic events in a pandas dataframe
Create IntervalIndex from periodic events in a pandas dataframe

Time:12-14

I have a dataframe that looks like this:

duration,window_start,window_end,REPETITIONS
0 days 01:00:00,2023-12-31,2024-01-07,5
0 days 00:30:00,2021-10-28,2021-11-02,10
0 days 00:20:00,2022-12-24,2023-01-04,15
0 days 01:00:00,2023-06-15,2023-06-17,20

I would like to extract these periodic events into a dataframe containing a start time and end time based on the number of REPETITIONS and the window_start and window_end. In the above example there should be 5 10 15 20=50 discrete events. I am struggling with vectorizing this conversion, and don't see a way around looping over each row.

What I've got so far:

import pandas as pd
import numpy as np

periodic = pd.read_csv("events.csv",header=0,parse_dates=["start_date", "end_date"], index_col="id")
 
start = periodic.apply(lambda row: np.linspace(row["window_start"].value, row["window_end"].value, row["REPETITIONS"]), axis=1)
start = start.apply(lambda row: pd.to_datetime(row))
end = start   periodic["duration"]

which gives two seperate Series; start and end that contain a DateTimeIndex for each id in the Series, i.e:

start.head()

1,"DatetimeIndex([          '2021-12-31 00:00:00',
               '2022-01-01 00:01:00',
               '2021-01-01 00:02:00',
               '2021-01-01 00:03:00',

end.head()

1,"DatetimeIndex([          '2021-12-31 01:00:00',
               '2022-01-01 00:02:00',
               '2021-01-01 00:03:00',
               '2021-01-01 00:04:00',

The goal is to have a result that should look like this:

id, start, end
1,'2021-12-31 00:00:00','2021-12-31 00:01:00'
1,'2021-12-31 00:00:00','2021-12-31 00:01:00'
1,'2021-12-31 00:00:00','2021-12-31 00:01:00'
.
.
.
2,'2021-10-28 00:00:00','2021-10-28 00:30:00'
2,'2021-10-28 13:20:00','2021-10-28 13:50:00'

CodePudding user response:

Have you tried something like this ?

df['duration'] = pd.to_timedelta(df['duration'])
ef = pd.DataFrame() # new df

# loop through
for i, row in df.iterrows():
    # date range for the given window start and end dates with duration as frequency
    dates = pd.date_range(row['window_start'], row['window_end'], freq=row['duration'])
    event_df = pd.DataFrame({'start': dates, 'end': dates   row['duration'], 'id': i 1})
    # append
    ef = ef.append(event_df)

# resample dataframe by id
result = ef.set_index('start').resample('D')['id'].count()
  • Related