Home > Enterprise >  Users' trip time over a particular period of time
Users' trip time over a particular period of time

Time:07-07

The Geolife dataset is a GPS trajectories of users logged as they move. Thanks to Sina Dabiri for providing a repository of the preprocessed data. I work with his preprocessed data and created a dataframe of GSP logs for the 69 users available.

In this post is a very little extract of the data for 3 user to describe by question.

import pandas as pd

data = {'user': [10,10,10,10,10,10,10,10,21,21,21,54,54,54,54,54,54,54,54,54],
 'lat': [39.921683,39.921583,39.92156,39.13622,39.136233,39.136241,39.136246,39.136251,42.171678,42.172055,
         42.172243,39.16008333,39.15823333,39.1569,39.156,39.15403333,39.15346667,39.15273333,39.14811667,39.14753333],
 'lon': [116.472342,116.472315,116.47229,117.218033,117.218046,117.218066,117.218166,117.218186,123.676778,123.677365,
         123.677657,117.1994167,117.2002333,117.2007667,117.2012167,117.202,117.20225,117.20255,117.2043167,117.2045833],
 'date': ['2009-03-21 13:30:35','2009-03-21 13:33:38','2009-03-21 13:34:40','2009-03-21 15:30:12','2009-03-21 15:32:35',
          '2009-03-21 15:38:36','2009-03-21 15:44:42','2009-03-21 15:48:43','2007-04-30 16:00:20', '2007-04-30 16:05:22',
          '2007-04-30 16:08:23','2007-04-30 11:47:38','2007-04-30 11:48:07','2007-04-30 11:48:27','2007-04-30 12:04:39',
          '2007-04-30 12:04:07','2007-04-30 12:04:32','2007-04-30 12:19:41','2007-04-30 12:20:08','2007-04-30 12:20:21']
 }

And the dataframe:

df = pd.DataFrame(data)

df
    user    lat        lon            date
0   10  39.921683   116.472342  2009-03-21 13:30:35
1   10  39.921583   116.472315  2009-03-21 13:33:38
2   10  39.921560   116.472290  2009-03-21 13:34:40
3   10  39.136220   117.218033  2009-03-21 15:30:12
4   10  39.136233   117.218046  2009-03-21 15:32:35
5   10  39.136241   117.218066  2009-03-21 15:38:36
6   10  39.136246   117.218166  2009-03-21 15:44:42
7   10  39.136251   117.218186  2009-03-21 15:48:43
8   21  42.171678   123.676778  2007-04-30 16:00:20
9   21  42.172055   123.677365  2007-04-30 16:05:22
10  21  42.172243   123.677657  2007-04-30 16:08:23
11  54  39.160083   117.199417  2007-04-30 11:47:38
12  54  39.158233   117.200233  2007-04-30 11:48:07
13  54  39.156900   117.200767  2007-04-30 11:48:27
14  54  39.156000   117.201217  2007-04-30 12:04:39
15  54  39.154033   117.202000  2007-04-30 12:04:07
16  54  39.153467   117.202250  2007-04-30 12:04:32
17  54  39.152733   117.202550  2007-04-30 12:19:41
18  54  39.148117   117.204317  2007-04-30 12:20:08
19  54  39.147533   117.204583  2007-04-30 12:20:21

My Question:

I want calculate for how long users travel in a particular period.

For example.

  • Total time users travelled in March-2009: Only user 10 travelled in this month. On 2009-03-21 from 13:30:35. But then after 13:34:40 there is a long jump to 15:30:12. Since this jumped period is more than 30-minutes, we consider it another trip. So user 10 has 2 trips recorded that month. First for about 5-minutes, second for about 19 minutes. So the duration of users trip for this month is 5 19 = 24 minutes.
  • In April 2007, users 21 and 54 recorded trips on the same day. User 21 started at 16:00:20 for about 8-minutes. User 54 started at 11:47:38 and after about 1-minute, we see a jump to 12:04:39. The time interval is not up to 30-minutes, so we consider it a single trip. For that, 54 covered trip for about 33-minutes. Users trip time in that month is therefore 8 33 = 41minutes.
  • Sometimes, I would also want to determined trip time from say February 2008 to March 2009.

How do I perform this sort of analysis?

Any point to, using the little data provided above would be appreciated.

CodePudding user response:

this code isn't the most effective, anyway you can test does it do what you need:

df['date'] = pd.to_datetime(df['date'])

duration = (df.groupby(['user', df['date'].dt.month]).
            apply(lambda x: (x['date']-x['date'].shift()).dt.seconds).
            rename('duration').
            to_frame())

res = (duration.mask(duration>1800,0).  # 1800 - limit for a trip duration in seconds
       groupby(level=[0,1]).
       sum().
       truediv(60).  # converting seconds to minutes
       rename_axis(index={'date':'month'}))

print(res)
'''
            duration
user month          
10   3         22.60
21   4          8.05
54   4         33.25
  • Related