Home > front end >  Calculate time between events in a day
Calculate time between events in a day

Time:09-17

I have the following dataset:

date                  event      next_event            duration_Minutes
2021-09-09 22:30:00    1         2021-09-09 23:00:00        30
2021-09-09 23:00:00    2         2021-09-09 23:10:00        10
2021-09-09 23:10:00    1         2021-09-09 23:50:00        40
2021-09-09 23:50:00    4         2021-09-10 00:50:00        60
2021-09-10 00:50:00    4         2021-09-12 00:50:00        2880

The main problem is that I would like to split the multi-day events into separate events in the following way. I would like to have the event duration from 2021-09-09 23:50:00 until 2021-09-10 00: 00: 00 and then the duration from 2021-09-10 00: 00: 00 to 2021-09-10 00:50:00, and so on. This would be useful because after, I would need to group the events by day and calculate the duration of the each event by day, so I would like to fix these situation in which there is the day change between events. I would like to obtain something like this:

date                  event      next_event            duration_Minutes
2021-09-09 22:30:00    1         2021-09-09 23:00:00        30
2021-09-09 23:00:00    2         2021-09-09 23:10:00        10
2021-09-09 23:10:00    1         2021-09-09 23:50:00        40
2021-09-09 23:50:00    4         2021-09-10 00:00:00        10
2021-09-09 00:00:00    4         2021-09-10 00:50:00        50
2021-09-10 00:50:00    4         2021-09-11 00:00:00        1390
2021-09-11 00:00:00    4         2021-09-12 00:00:00        1440
2021-09-12 00:00:00    4         2021-09-12 00:50:00        50

It should be able to handle situations in which we don't have an event for an entire day or more like in the example.

My current solution for now is:

first_record_hour_ts = df.index.floor('H')[0]
last_record_hour_ts = df.index.floor('H')[-1]

# Create a series from the first to the last date containing Nan
df_to_join = pd.Series(np.nan, index=pd.date_range(first_record_hour_ts, last_record_hour_ts, freq='H'))
df_to_join = pd.DataFrame(df_to_join)
# Concatenate with current status dataframe
df = pd.concat([df, df_to_join[~df_to_join.index.isin(df.index)]]).sort_index()
# Forward fill the nana
df.fillna(method='ffill', inplace=True)

df['next_event'] = df.index.shift(-1)
# Calculate the delta between the 2 status
df['duration'] = df['next_event'] - df.index
# Convert into minutes
df['duration_Minutes'] = df['duration_Minutes'].apply(lambda x: x.total_seconds() // 60)

This doesn't solve exactly the problem, but I think it may solve my goal which being able to group by event and by day at the end.

CodePudding user response:

Ok, the code below looks a bit long -- and there's certainly a better/more efficient/shorter way of doing this. But I think it's pretty reasonably simple to follow along.

split_datetime_span_by_day below takes two dates: start_date and end_date. In your case, it would be date and next_event from your source data.

The function then checks whether that time period (start -> end) spans over midnight. If it doesn't, it returns the start date, the end date, and the time period in seconds. If it does span over midnight, it creates a new segment (start -> midnight), and then calls itself again (i.e. recurses), and the process continues until the time period does not span over midnight.

Just a note: the returned segment list is made up of tuples of (start, end, nmb_seconds). I'm returning the number of seconds, not the number of minutes as in your question, because I didn't know how you wanted to round the seconds (up, down, etc.). That's left as an exercise for the reader :-)

from datetime import datetime, timedelta

def split_datetime_span_by_day(start_date, end_date, split_segments=None):
  assert start_date < end_date # sanity check

  # when is the next midnight after start_date?
  # adapted from https://ispycode.com/Blog/python/2016-07/Get-Midnight-Today
  start_next_midnight = datetime.combine(start_date, datetime.min.time())   timedelta(days=1)

  if split_segments is None:
    split_segments = []

  if end_date < start_next_midnight:
    # end date is before next midnight, no split necessary
    return split_segments   [(
      start_date,
      end_date,
      (end_date - start_date).total_seconds()
    )]

  # otherwise, split at next midnight...
  split_segments  = [(
    start_date,
    start_next_midnight,
    (start_next_midnight - start_date).total_seconds()
  )]

  if (end_date - start_next_midnight).total_seconds() > 0:  
    # ...and recurse to get next segment
    return split_datetime_span_by_day(
      start_date=start_next_midnight,
      end_date=end_date,
      split_segments=split_segments
    )
  else:
    # case where start_next_midnight == end_date i.e. end_date is midnight
    # don't split & create a 0 second segment
    return split_segments

# test case: 
start_date = datetime.strptime('2021-09-12 00:00:00', '%Y-%m-%d %H:%M:%S')
end_date = datetime.strptime('2021-09-14 01:00:00', '%Y-%m-%d %H:%M:%S')

print(split_datetime_span_by_day(start_date=start_date, end_date=end_date))

# returned values:
# [
#   (datetime.datetime(2021, 9, 12, 0, 0), datetime.datetime(2021, 9, 13, 0, 0), 86400.0),
#   (datetime.datetime(2021, 9, 13, 0, 0), datetime.datetime(2021, 9, 14, 0, 0), 86400.0),
#   (datetime.datetime(2021, 9, 14, 0, 0), datetime.datetime(2021, 9, 14, 1, 0), 3600.0)
# ]
  • Related