I want to calculate duration of a column in seconds grouped by IP and the column for an ip looks like below: EventTimeStamp ip Browser BrowserVersion Deviceos GeoCarrier GeoCity GeoCountry GeoRegion PageExperience PageName PageParentUrl PageType PageUrl
TimeStamp ID
2022-02-09 18:23:01 101
2022-02-09 18:23:02 101
2022-02-09 18:23:03 101
2022-02-09 18:23:03 101
2022-02-09 18:23:03 101
2022-02-09 18:23:07 101
2022-02-09 18:23:08 101
2022-02-09 18:23:21 101
2022-02-09 18:23:21 101
For this ID, the duration should be calculated as 20 seconds. and if there is only one entry for an ID, duration should be zero.
I tried below code, but it gives me
p=df[df['ID']=='101']
p['TimeStamp'].tail(1)-p['TimeStamp'].head(1)
0 NaT
8 NaT
Name: TimeStamp, dtype: timedelta64[ns]
Datatype of the columns are:
df.dtypes
TimeStamp datetime64[ns]
ID object
and this piece of code to get duration of each ID: duration = []
for ip,tempdf in df.sort_values(['ID','TimeStamp']).groupby('ID'):
d = ((tempdf['TimeStamp'].tail(1)-tempdf['TimeStamp'].head(1)))
duration.append(d)
CodePudding user response:
Below command helped me to solve my problem. This query gives duration in seconds for each ID.
duration = []
for ID,tempdf in df.groupby('ID'):
tail = (tempdf['TimeStamp'].sort_index().iloc[-1])
head = (tempdf['TimeStamp'].sort_index().iloc[0])
diff = (tail-head).seconds
duration.append(diff)
CodePudding user response:
You can use groupby
with np.ptp
:
df.groupby('ID')['TimeStamp'].apply(np.ptp).dt.total_seconds()