Home > OS >  Calculating a duration from two dates in different time zones
Calculating a duration from two dates in different time zones

Time:10-12

I have a CSV file with trip data:

Trip ID,Depart Time,Arrive Time,Depart Timezone,Arrive Timezone
1,08/29/21 09:00 PM,08/29/21 09:45 PM,GMT-04:00,GMT-04:00
2,08/29/21 10:00 PM,08/30/21 01:28 AM,GMT-04:00,GMT-04:00
3,08/30/21 01:29 AM,08/30/21 01:30 AM,GMT-04:00,GMT-04:00
4,08/30/21 01:45 AM,08/30/21 03:06 AM,GMT-04:00,GMT-04:00
5,08/30/21 03:08 AM,08/30/21 03:58 AM,GMT-04:00,GMT-04:00
6,08/30/21 03:59 AM,08/30/21 04:15 AM,GMT-04:00,GMT-04:00

I can read this file into a dataframe:

trips = pd.read_csv("trips.csv", sep=',')

What I would like to accomplish is to add a column 'duration' which gives me the trip duration in minutes. The trip duration has to be calculated as the difference between the trip arrival time and the trip departure time. In the above table, the 'depart time' is relative to the 'Depart Timezone'. Similarly, the 'Arrive Time' is relative to the 'Arrive Timezone'. Note that in the above example, the arrival and departure dates, as well as the arrival and departure time zones happen to be the same, but this does not hold in general for my data.

CodePudding user response:

What you have are UTC offsets (GMT-04:00 is four hours behind UTC); you can join the date/time column and respective offset column by ' ' and parse to_datetime. You can then calculate duration (timedelta) from the resulting tz-aware datetime columns. Ex:

# make datetime columns:
df['dt_depart'] = pd.to_datetime(df['Depart Time']   ' '   df['Depart Timezone'],
                                 utc=True)
df['dt_arrive'] = pd.to_datetime(df['Arrive Time']   ' '   df['Arrive Timezone'],
                                 utc=True)

Note: I'm using UTC=True here in case there are mixed UTC offsets in the input. That gives e.g.

df['dt_depart']
Out[6]: 
0   2021-08-29 17:00:00 00:00
1   2021-08-29 18:00:00 00:00
2   2021-08-29 21:29:00 00:00
3   2021-08-29 21:45:00 00:00
4   2021-08-29 23:08:00 00:00
5   2021-08-29 23:59:00 00:00
Name: dt_depart, dtype: datetime64[ns, UTC]

then

# calculate the travel duration (timedelta column):
df['traveltime'] = df['dt_arrive'] - df['dt_depart']

gives e.g.

df['traveltime']
Out[7]: 
0   0 days 00:45:00
1   0 days 03:28:00
2   0 days 00:01:00
3   0 days 01:21:00
4   0 days 00:50:00
5   0 days 00:16:00
Name: traveltime, dtype: timedelta64[ns]
  • Related