I have data that looks like:
TimeUTC | TimeUTC | ID | Latitude | Longitude |
---|---|---|---|---|
2021-06-01 08:58:47 00:00 | 2021-06-01 08:58:47 00:00 | 9c3807ce-bf21-4cd8-b4ac-f2da440340dc | 42.9186 | -70.8866 |
2021-06-01 09:00:11 00:00 | 2021-06-01 09:00:11 00:00 | 16e9ea6c-2722-4881-bd35-5867e83be19b | 42.9186 | -70.8866 |
2021-06-01 09:00:21 00:00 | 2021-06-01 09:00:21 00:00 | 16e9ea6c-2722-4881-bd35-5867e83be19b | 42.9186 | -70.8866 |
2021-06-01 09:00:24 00:00 | 2021-06-01 09:00:24 00:00 | 16e9ea6c-2722-4881-bd35-5867e83be19b | 42.9186 | -70.8866 |
2021-06-01 09:00:25 00:00 | 2021-06-01 09:00:25 00:00 | 16e9ea6c-2722-4881-bd35-5867e83be19b | 42.9186 | -70.8866 |
2021-06-01 09:00:37 00:00 | 2021-06-01 09:00:37 00:00 | 16e9ea6c-2722-4881-bd35-5867e83be19b | 42.9186 | -70.8866 |
2021-06-01 09:00:41 00:00 | 2021-06-01 09:00:41 00:00 | 16e9ea6c-2722-4881-bd35-5867e83be19b | 42.9186 | -70.8866 |
For every row, I want to calculate the speed of the object based on the time interval and change in lat/long. The row's speed should be added to the dataframe or a copy of the dataframe as a new row.
My speed function is:
def ucap_speed_desired(df):
if len(df) < 2:
return (0)
else:
return(gps_speed(df['Latitude'], df['Longitude'], df['TimeUTC'])
Where gps_speed
is a vectorized haversine calculation.
I tried:
t.groupby(['ID']).apply(lambda x: tracklib.ucap_speed_desired(x))
Which returns:
ID | |
---|---|
001867c0-58cf-41a0-b44a-b0cbd8829b18 | 0 |
001f60b9-db9a-4fe0-9395-29ecd7f2bbf6 | 0 |
0044a1cf-2dfb-4e37-889d-a4ab9533b5ab | [nan, 3.6224195651392077, 0.189001988151531, 0... |
007bb38f-f178-4290-8162-63340b4b81b4 | [nan, 66.89360534185278, 49.87836843507186, 75... |
That has all of the information I want - a list of speeds for each row of the dataframe grouped by ID.
What I cannot figure out is how to add the speeds back to the original row.
transform
didn't seem to work. apply
didn't work.
I tried modifying the dataframe in the speed function but since groupby passes a copy of the dataframe and not the original, that did not work.
Ultimately, I walked away from the lambda solution and did this:
def ucap_speed(df):
grouped = df.groupby(['ID'], as_index=False)
newdf = pd.DataFrame()
for name, group in grouped:
if len(group) < 2:
group['speed'] = np.nan
else:
group['speed'] = gps_speed(group['Latitude'], group['Longitude'], group['TimeUTC'])
newdf = newdf.append(group)
newdf['speed'] = newdf['speed'].fillna(0)
return(newdf)
So, grouped the dataframe, applied the haversine function to that group, and build up a new dataframe by appending each updated df to it. This is terribly slow. It feels like I'm one appropriate step away but I cannot find that step.
CodePudding user response:
I think you need assign column inside function and then retutn group, here called df
:
def ucap_speed_desired(df):
if len(df) < 2:
df['speed'] = 0
else:
df['speed'] = gps_speed(df['Latitude'], df['Longitude'], df['TimeUTC'])
return df
t.groupby(['ID']).apply(tracklib.ucap_speed_desired).fillna({'speed':0})