Home > OS >  get duration in seconds grouped by ID column in Python
get duration in seconds grouped by ID column in Python

Time:04-09

I want to calculate duration of a column in seconds grouped by IP and the column for an ip looks like below: EventTimeStamp ip Browser BrowserVersion Deviceos GeoCarrier GeoCity GeoCountry GeoRegion PageExperience PageName PageParentUrl PageType PageUrl

TimeStamp           ID
2022-02-09 18:23:01 101 
2022-02-09 18:23:02 101 
2022-02-09 18:23:03 101
2022-02-09 18:23:03 101
2022-02-09 18:23:03 101
2022-02-09 18:23:07 101 
2022-02-09 18:23:08 101 
2022-02-09 18:23:21 101 
2022-02-09 18:23:21 101

For this ID, the duration should be calculated as 20 seconds. and if there is only one entry for an ID, duration should be zero.

I tried below code, but it gives me

p=df[df['ID']=='101']
p['TimeStamp'].tail(1)-p['TimeStamp'].head(1)

0   NaT
8   NaT
Name: TimeStamp, dtype: timedelta64[ns]

Datatype of the columns are:

df.dtypes
    
    TimeStamp     datetime64[ns]
    ID                    object

and this piece of code to get duration of each ID: duration = []

for ip,tempdf in df.sort_values(['ID','TimeStamp']).groupby('ID'):
    d = ((tempdf['TimeStamp'].tail(1)-tempdf['TimeStamp'].head(1)))
    duration.append(d)

CodePudding user response:

Below command helped me to solve my problem. This query gives duration in seconds for each ID.

duration = []
for ID,tempdf in df.groupby('ID'):
    tail = (tempdf['TimeStamp'].sort_index().iloc[-1])
    head = (tempdf['TimeStamp'].sort_index().iloc[0])
    diff = (tail-head).seconds
    duration.append(diff)

CodePudding user response:

You can use groupby with np.ptp:

 df.groupby('ID')['TimeStamp'].apply(np.ptp).dt.total_seconds()
  • Related