I have the following data frame structure:
id_trip | dtm_start_trip | dtm_end_trip | start_station | end_station |
---|---|---|---|---|
1 | 2018-10-01 10:15:00 | 2018-10-01 10:17:00 | 100 | 200 |
2 | 2018-10-01 10:17:00 | 2018-10-01 10:18:00 | 200 | 100 |
3 | 2018-10-01 10:19:00 | 2018-10-01 10:34:00 | 100 | 300 |
4 | 2018-10-01 10:20:00 | 2018-10-01 10:22:00 | 300 | 100 |
5 | 2018-10-01 10:20:00 | 2018-10-01 10:29:00 | 400 | 400 |
And I would like to check, using python, how often a trip starts and ends in a given season. The idea was to do these average intervals per day, per hour and then in intervals of a few minutes.
What would be the best approach to doing this?
My desired output would be something to inform eg: for station 100 on 2018-10-01, a travel starts, on average, every 4 minutes
CodePudding user response:
In order to do that you could group your DataFrame by different travels. Firstly, I would make a new column with a travel id, so travels starting and ending in the same stations can be grouped. Then you can easily group those rows by travel id and get all the information you need.
Please note that your data sample does not include any "same travel". Also, consider providing a code sample for your data, it would be easier for us to work with and run tests.
CodePudding user response:
First transform the date columns into Pandas DateTime
df.dtm_start_trip = pd.to_datetime(df.dtm_start_trip)
def deltas_mean(x):
d = x-x.shift()
return d.mean()
df.groupby('start_station').agg({'dtm_start_trip':deltas_mean})