I want to calculate the distance between two gps coordinates (the first and the last one for each tripId), to get the distance for each trip my dataframe looks like that
tripId latitude longitude timestamp
0 1817603 53.155273 8.207176 2021-05-24 00:29:22
1 1817603 53.155271 8.206898 2021-05-24 00:29:38
2 1817603 53.155213 8.206314 2021-05-24 00:29:44
3 1817603 53.155135 8.206429 2021-05-24 00:29:50
4 1817603 53.154950 8.206565 2021-05-24 00:29:56
... ... ... ... ...
195 1817888 53.092805 8.212095 2021-05-24 08:27:54
196 1817888 53.093024 8.211756 2021-05-24 08:27:59
197 1817888 53.093305 8.211383 2021-05-24 08:28:05
198 1817888 53.093594 8.211026 2021-05-24 08:28:10
199 1817888 53.093853 8.210708 2021-05-24 08:28:15
i did that for each step using s = pd.Series(haversine_vector(df, df.shift(),Unit.KILOMETERS), index=df.index, name='distance_K')
but i need to know the distance for the whole trip for each id
I have used this as a test and it works, but i need to know the exact duration for each trip (the final duration)
for i in range(1,df.shape[0]-1):
if df['tripId'][i]==df['tripId'][i 1]:
df['distance'][i]=df['distance'][i-1] df['distance_K'][i]
else:
df['distance'][i]=df['distance_K'][i]
CodePudding user response:
Use groupby_apply
to compute haversine distance for each trip:
# Inspired by https://stackoverflow.com/a/4913653/15239951
def haversine_series(sr):
lon1 = sr['longitude']
lat1 = sr['latitude']
lon2 = sr['longitude'].shift(fill_value=sr['longitude'].iloc[0])
lat2 = sr['latitude'].shift(fill_value=sr['latitude'].iloc[0])
lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat / 2.0)**2 np.cos(lat1) * np.cos(lat2) * np.sin(dlon / 2.0)**2
c = 2 * np.arcsin(np.sqrt(a))
km = 6371 * c
return km
df['distance_K'] = df.groupby('tripId').apply(haversine_series).droplevel(0)
Note: I suppose your dataframe is already sorted by timestamp
column.
At this point, your dataframe looks like:
>>> df
tripId latitude longitude timestamp distance_K
0 1817603 53.155273 8.207176 2021-05-24 00:29:22 0.000000
1 1817603 53.155271 8.206898 2021-05-24 00:29:38 0.018538
2 1817603 53.155213 8.206314 2021-05-24 00:29:44 0.039470
3 1817603 53.155135 8.206429 2021-05-24 00:29:50 0.011577
4 1817603 53.154950 8.206565 2021-05-24 00:29:56 0.022481
195 1817888 53.092805 8.212095 2021-05-24 08:27:54 0.000000
196 1817888 53.093024 8.211756 2021-05-24 08:27:59 0.033248
197 1817888 53.093305 8.211383 2021-05-24 08:28:05 0.039958
198 1817888 53.093594 8.211026 2021-05-24 08:28:10 0.040012
199 1817888 53.093853 8.210708 2021-05-24 08:28:15 0.035781
Now the total distance and time for each trip is easy to get with groupby_agg
:
>>> df.groupby('tripId') \
.agg(total_distance=('distance_K', 'sum'),
total_time=('timestamp', lambda x: x.max()-x.min())) \
.reset_index()
tripId total_distance total_time
0 1817603 0.092066 0 days 00:00:34
1 1817888 0.148999 0 days 00:00:21
CodePudding user response:
You can use
from haversine import haversine_vector
df = df.groupby('tripId').apply(
lambda g: g.assign(distance=lambda g: [0, *haversine_vector(
g.iloc[:-1][['latitude', 'longitude']].values,
g.iloc[1:][['latitude', 'longitude']].values,
)])
).droplevel(0)
df
# tripId latitude longitude timestamp distance
# 0 1817603 53.155273 8.207176 2021-05-24 00:29:22 0.000000
# 1 1817603 53.155271 8.206898 2021-05-24 00:29:38 0.018538
# 2 1817603 53.155213 8.206314 2021-05-24 00:29:44 0.039470
# 3 1817603 53.155135 8.206429 2021-05-24 00:29:50 0.011577
# 4 1817603 53.154950 8.206565 2021-05-24 00:29:56 0.022481
# 5 1817888 53.092805 8.212095 2021-05-24 08:27:54 0.000000
# 6 1817888 53.093024 8.211756 2021-05-24 08:27:59 0.033248
# 7 1817888 53.093305 8.211383 2021-05-24 08:28:05 0.039958
# 8 1817888 53.093594 8.211026 2021-05-24 08:28:10 0.040012
# 9 1817888 53.093853 8.210708 2021-05-24 08:28:15 0.035781
and get the total time and distance
df.groupby('tripId').agg(
{
'timestamp': lambda g: g.iloc[-1] - g.iloc[0],
'distance':'sum'
}
)
# timestamp distance
# tripId
# 1817603 0 days 00:00:34 0.092066
# 1817888 0 days 00:00:21 0.148999