Home > OS >  Pandas reading time of day from columns and calculating difference between the two times
Pandas reading time of day from columns and calculating difference between the two times

Time:02-20

I have a dataset which records 'time-from' and 'time-to' for each user as time of day in a 24 hour format. Example:

User  time_from   time_to  Activity
a     6:00        8:00     2
a     8:00        8:30     1
a     8:30        10:00    1
a     10:00       13:00    0
b     6:00        8:00     2
b     8:00        8:30     0
b     8:30        10:00    2
b     10:00       13:00    1
.
.
. 

I want to get the number of hours spent on each activity for these users on an activity. I figured my intermediate step is to get:

User  time_from   time_to  Activity  time_spent
a     6:00        8:00     2         2:00
a     8:00        8:30     1         0:30
a     8:30        10:00    1         1:30
a     10:00       13:00    0         3:00
b     6:00        8:00     2         2:00
b     8:00        8:30     0         0:30
b     8:30        10:00    2         1:30
b     10:00       13:00    1         3:00
.
.
. 

Which I try to get using:

df['time_from'] = pd.to_datetime(df['time_from'], format='%H:%M')
df['time_to'] = pd.to_datetime(df['time_to'], format='%H:%M') 
df['time_spent'] = df['time_to'] - df['time_from']

However it gives me an output like:


User  time_from           time_to          Activity  time_spent
a     1900-01-01 6:00     1900-01-01 8:00  2         0 days 2:00
a     1900-01-01 8:00     1900-01-01 8:30  1         0 days 0:30
a     1900-01-01 8:30     1900-01-01 10:00 1         0 days 1:30
a     1900-01-01 10:00    1900-01-01 13:00 0         0 days 3:00
b     1900-01-01 6:00     1900-01-01 8:00  2         0 days 2:00
b     1900-01-01 8:00     1900-01-01 8:30  0         0 days 0:30
b     1900-01-01 8:30     1900-01-01 10:00 2         0 days 1:30
b     1900-01-01 10:00    1900-01-01 13:00 1         0 days 3:00
.
.
. 

Can somebody help me get the desired result? I might be able to group_by on activities and them sum time_spent to get my ultimate result, but the output I am getting right now probably won't work.

CodePudding user response:

Because you convert the time to datetime which gives you a Timestamp with structure ('1900-01-01 06:00:00'). I suggest to substract it directly.

CodePudding user response:

You can use the groupby on Users and Activity and immediately follow by agg(sum) to total the time_spent of a user on a given activity.

df['time_from'] = pd.to_datetime(df['time_from'], format='%H:%M')
df['time_to'] = pd.to_datetime(df['time_to'], format='%H:%M')
df['time_spent'] = df['time_to'] - df['time_from']

new_df = df.groupby(['User','Activity']).agg(sum)

Output from new_df

User Activity
a    0        0 days 03:00:00
     1        0 days 02:00:00
     2        0 days 02:00:00
b    0        0 days 00:30:00
     1        0 days 03:00:00
     2        0 days 03:30:00
  • Related