Home > Software design >  How to calculate a Process Duration from a TimeSeries Dataset with Pandas
How to calculate a Process Duration from a TimeSeries Dataset with Pandas

Time:04-20

I have a huge dataset of various sensor data sorted chronologically (by timestamp) and by sensor type. I want to calculate the duration of a process in seconds by subtracting the first entry of a sensor from the last entry. This is to be done with python and pandas. Attached is an example for better understanding: enter image description here

I want to subtract the first row from the last row for each sensor type to get the process duration in seconds (i.e. row 8 minus row 1 : 2022-04-04T09:44:56.962Z - 2022-04-04T09:44:56.507Z = 0.455 seconds). The duration should then be written to a newly created column in the last row of the sensor type.

Thanks in advance!

CodePudding user response:

Assuming your 'timestamp' column is already 'to_datetime' converted, would this work ?

df['diffPerSensor_type']=df.groupby('sensor_type')['timestamp'].transform('last')-df.groupby('sensor_type')['timestamp'].transform('first')

You could then extract your seconds with this

df['diffPerSensor_type'].dt.seconds

CodePudding user response:

If someone wants to reproduce an example, here is a df:

import pandas as pd

df = pd.DataFrame({
    'sensor_type' : [0]*7   [1]*11   [13]*5   [8]*5,
    'timestamp' : pd.date_range('2022-04-04', periods=28, freq='ms'),
    'value' : [128] * 28
})
df['time_diff in milliseconds'] = (df.groupby('sensor_type')['timestamp']
                   .transform(lambda x: x.iloc[-1]-x.iloc[0])
                   .dt.components.milliseconds)

print(df.head(10))
   sensor_type               timestamp  value  time_diff in milliseconds
0            0 2022-04-04 00:00:00.000    128                          6
1            0 2022-04-04 00:00:00.001    128                          6
2            0 2022-04-04 00:00:00.002    128                          6
3            0 2022-04-04 00:00:00.003    128                          6
4            0 2022-04-04 00:00:00.004    128                          6
5            0 2022-04-04 00:00:00.005    128                          6
6            0 2022-04-04 00:00:00.006    128                          6
7            1 2022-04-04 00:00:00.007    128                         10
8            1 2022-04-04 00:00:00.008    128                         10
9            1 2022-04-04 00:00:00.009    128                         10

My solution is nearly the same as @Daniel Weigel , only that I used lambda to calc the difference.

  • Related