I have a huge dataset of various sensor data sorted chronologically (by timestamp) and by sensor type. I want to calculate the duration of a process in seconds by subtracting the first entry of a sensor from the last entry. This is to be done with python and pandas. Attached is an example for better understanding: enter image description here
I want to subtract the first row from the last row for each sensor type to get the process duration in seconds (i.e. row 8 minus row 1 : 2022-04-04T09:44:56.962Z - 2022-04-04T09:44:56.507Z = 0.455 seconds). The duration should then be written to a newly created column in the last row of the sensor type.
Thanks in advance!
CodePudding user response:
Assuming your 'timestamp' column is already 'to_datetime' converted, would this work ?
df['diffPerSensor_type']=df.groupby('sensor_type')['timestamp'].transform('last')-df.groupby('sensor_type')['timestamp'].transform('first')
You could then extract your seconds with this
df['diffPerSensor_type'].dt.seconds
CodePudding user response:
If someone wants to reproduce an example, here is a df:
import pandas as pd
df = pd.DataFrame({
'sensor_type' : [0]*7 [1]*11 [13]*5 [8]*5,
'timestamp' : pd.date_range('2022-04-04', periods=28, freq='ms'),
'value' : [128] * 28
})
df['time_diff in milliseconds'] = (df.groupby('sensor_type')['timestamp']
.transform(lambda x: x.iloc[-1]-x.iloc[0])
.dt.components.milliseconds)
print(df.head(10))
sensor_type timestamp value time_diff in milliseconds
0 0 2022-04-04 00:00:00.000 128 6
1 0 2022-04-04 00:00:00.001 128 6
2 0 2022-04-04 00:00:00.002 128 6
3 0 2022-04-04 00:00:00.003 128 6
4 0 2022-04-04 00:00:00.004 128 6
5 0 2022-04-04 00:00:00.005 128 6
6 0 2022-04-04 00:00:00.006 128 6
7 1 2022-04-04 00:00:00.007 128 10
8 1 2022-04-04 00:00:00.008 128 10
9 1 2022-04-04 00:00:00.009 128 10
My solution is nearly the same as @Daniel Weigel , only that I used lambda
to calc the difference.