Home > Software design >  Determine the duration of an event
Determine the duration of an event

Time:01-23

I have a dataframe with a list of events, a column for an indicator for a criterion, and a column for a timestamp.

For each event, if the indicator is true, I want to see if the event lasted more than one period, and for how long.

In terms of an expected output, I have provided an example below. For the duration column, A is true for only one time period so it will be coded as 1. Then, A is False for the next period, so it will code that as 0. Then, A is true for 2 time periods, so the duration is two, the next entry can be coded as 0 since I am only interested in the first entry, and so on.

    id  target  time                  duration
0   A   True    2023-01-22 11:00:00   1
3   A   False   2023-01-22 11:05:00   0
6   A   True    2023-01-22 11:10:00   2
9   A   True    2023-01-22 11:15:00   0
12  A   False   2023-01-22 11:20:00   0

But I have no idea how to do this.

A sample dataframe is included below

import pandas as pd
time_test = pd.DataFrame({'id':[
    'A','B','C','A','B','C',
    'A','B','C','A','B','C',
    'A','B','C','A','B','C'],
    'target':[
        'True','True','True','False','True','True',
        'True','False','True','True','True','True',
        'False','True','False','True','False','True'],
    'time':[
        '11:00','11:00','11:00','11:05','11:05','11:05',
        '11:10','11:10','11:10','11:15','11:15','11:15',
        '11:20','11:20','11:20','11:25','11:25','11:25']})
time_test =time_test.sort_values(['id','time'])
time_test['time'] =pd.to_datetime(time_test['time'])
time_test

EDIT: I need to provide some clarification about the expected output

Let's take group B as an example. An event occurs for B at 11:00, indicated by the "True" under target. At 11:05, the event is still occurring so duration should be 2 for the row 1 B True 2023-01-22 11:00:00 . I am not interested in the row following so that can coded as 0. So in a since 0 would represent both "already accounted for" and the absence of an event.

At 11:10 that event is not occurring so the summation would re-set.

At 11:15 another event is occurring, and at 11:20 that event is still going, so the value for the first row should be 2.

In the end, the values for B should be 2,0,0,2,0,0.

I can see why this method would be confusing but I hope my explanation makes since. My data is in 5 minute chunks so I figured I could just count the number of chunks to see how long an event lasted for, instead of using a start and end time to calculate the elapsed time(but maybe that would be easier?)

CodePudding user response:

Annotated code

# Convert the target column to boolean
mask = time_test['target'].eq('True')

# Create subgroups to identify blocks of consecutive True's
time_test['subgrps'] = (~mask).cumsum()[mask]

# Group the target mask by id and subgrps
g = mask.groupby([time_test['id'], time_test['subgrps']])

# Create a boolean mask to identify dupes per id and subgrps
dupes = time_test.duplicated(subset=['id', 'subgrps'])

# Sum the True value per group and mask the duplicates
time_test['duration'] = g.transform('sum').mask(dupes).fillna(0)

Result

   id target                time  subgrps  duration
0   A   True 2023-01-22 11:00:00      0.0       1.0
3   A  False 2023-01-22 11:05:00      NaN       0.0
6   A   True 2023-01-22 11:10:00      1.0       2.0
9   A   True 2023-01-22 11:15:00      1.0       0.0
12  A  False 2023-01-22 11:20:00      NaN       0.0
15  A   True 2023-01-22 11:25:00      2.0       1.0
1   B   True 2023-01-22 11:00:00      2.0       2.0
4   B   True 2023-01-22 11:05:00      2.0       0.0
7   B  False 2023-01-22 11:10:00      NaN       0.0
10  B   True 2023-01-22 11:15:00      3.0       2.0
13  B   True 2023-01-22 11:20:00      3.0       0.0
16  B  False 2023-01-22 11:25:00      NaN       0.0
2   C   True 2023-01-22 11:00:00      4.0       4.0
5   C   True 2023-01-22 11:05:00      4.0       0.0
8   C   True 2023-01-22 11:10:00      4.0       0.0
11  C   True 2023-01-22 11:15:00      4.0       0.0
14  C  False 2023-01-22 11:20:00      NaN       0.0
17  C   True 2023-01-22 11:25:00      5.0       1.0
  • Related