Similar to this question, I would like to compute the time difference between rows of a dataframe. Unlike that question however, the difference should be by groupby
id.
So foe example, this dataframe:
df = pd.DataFrame(
{'id': [6,6,6,6,6,10,10,10,10,10],
'timestamp': ['2016-04-01 00:04:00','2016-04-01 00:04:20','2016-04-01 00:04:30',
'2016-04-01 00:04:35','2016-04-01 00:04:54','2016-04-30 13:04:59',
'2016-04-30 13:05:00','2016-04-30 13:05:12','2016-04-30 13:05:20',
'2016-04-30 13:05:51']}
)
df.head()
id timestamp
0 6 2016-04-01 00:04:00
1 6 2016-04-01 00:04:20
2 6 2016-04-01 00:04:30
3 6 2016-04-01 00:04:35
4 6 2016-04-01 00:04:54
5 10 2016-04-30 13:04:59
6 10 2016-04-30 13:05:00
7 10 2016-04-30 13:05:12
8 10 2016-04-30 13:05:20
9 10 2016-04-30 13:05:51
Then I want to create a column ΔT
for the differences, like so:
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y-%m-%d %H:%M:%S')
df['ΔT'] = df.groupby('id').index.to_series().diff().astype('timedelta64[s]')
AttributeError: 'DataFrameGroupBy' object has no attribute 'index'
Intended output:
id timestamp ΔT
0 6 2016-04-01 00:04:00 0
1 6 2016-04-01 00:04:20 20
2 6 2016-04-01 00:04:30 10
3 6 2016-04-01 00:04:35 5
4 6 2016-04-01 00:04:54 19
5 10 2016-04-30 13:04:59 0
6 10 2016-04-30 13:05:00 1
7 10 2016-04-30 13:05:12 12
8 10 2016-04-30 13:05:20 8
9 10 2016-04-30 13:05:51 31
CodePudding user response:
df.groupby('id')['timestamp'].diff().dt.total_seconds().fillna(0)
CodePudding user response:
Try:
df["ΔT"] = df.groupby("id").diff()
df["ΔT"] = df["ΔT"].dt.seconds
df["ΔT"] = df["ΔT"].fillna(0).astype(int)
print(df)
Prints:
id timestamp ΔT
0 6 2016-04-01 00:04:00 0
1 6 2016-04-01 00:04:20 20
2 6 2016-04-01 00:04:30 10
3 6 2016-04-01 00:04:35 5
4 6 2016-04-01 00:04:54 19
5 10 2016-04-30 13:04:59 0
6 10 2016-04-30 13:05:00 1
7 10 2016-04-30 13:05:12 12
8 10 2016-04-30 13:05:20 8
9 10 2016-04-30 13:05:51 31