Update each row of a dataframe using groupby lambda custom function. apply and transform do not ap-CodePudding

I have data that looks like:

TimeUTC	TimeUTC	ID	Latitude	Longitude
2021-06-01 08:58:47 00:00	2021-06-01 08:58:47 00:00	9c3807ce-bf21-4cd8-b4ac-f2da440340dc	42.9186	-70.8866
2021-06-01 09:00:11 00:00	2021-06-01 09:00:11 00:00	16e9ea6c-2722-4881-bd35-5867e83be19b	42.9186	-70.8866
2021-06-01 09:00:21 00:00	2021-06-01 09:00:21 00:00	16e9ea6c-2722-4881-bd35-5867e83be19b	42.9186	-70.8866
2021-06-01 09:00:24 00:00	2021-06-01 09:00:24 00:00	16e9ea6c-2722-4881-bd35-5867e83be19b	42.9186	-70.8866
2021-06-01 09:00:25 00:00	2021-06-01 09:00:25 00:00	16e9ea6c-2722-4881-bd35-5867e83be19b	42.9186	-70.8866
2021-06-01 09:00:37 00:00	2021-06-01 09:00:37 00:00	16e9ea6c-2722-4881-bd35-5867e83be19b	42.9186	-70.8866
2021-06-01 09:00:41 00:00	2021-06-01 09:00:41 00:00	16e9ea6c-2722-4881-bd35-5867e83be19b	42.9186	-70.8866

For every row, I want to calculate the speed of the object based on the time interval and change in lat/long. The row's speed should be added to the dataframe or a copy of the dataframe as a new row.

My speed function is:

def ucap_speed_desired(df):

if len(df) < 2:
    return (0)
else:
    return(gps_speed(df['Latitude'], df['Longitude'], df['TimeUTC'])

Where gps_speed is a vectorized haversine calculation.

I tried:

t.groupby(['ID']).apply(lambda x: tracklib.ucap_speed_desired(x))

Which returns:

ID
001867c0-58cf-41a0-b44a-b0cbd8829b18	0
001f60b9-db9a-4fe0-9395-29ecd7f2bbf6	0
0044a1cf-2dfb-4e37-889d-a4ab9533b5ab	[nan, 3.6224195651392077, 0.189001988151531, 0...
007bb38f-f178-4290-8162-63340b4b81b4	[nan, 66.89360534185278, 49.87836843507186, 75...

That has all of the information I want - a list of speeds for each row of the dataframe grouped by ID.

What I cannot figure out is how to add the speeds back to the original row.

transform didn't seem to work. apply didn't work.

I tried modifying the dataframe in the speed function but since groupby passes a copy of the dataframe and not the original, that did not work.

Ultimately, I walked away from the lambda solution and did this:

def ucap_speed(df):

grouped = df.groupby(['ID'], as_index=False)
newdf = pd.DataFrame()
for name, group in grouped:
    if len(group) < 2:
        group['speed'] = np.nan
    else:
        group['speed'] = gps_speed(group['Latitude'], group['Longitude'], group['TimeUTC'])

    newdf = newdf.append(group)
    
newdf['speed'] = newdf['speed'].fillna(0)

return(newdf)

So, grouped the dataframe, applied the haversine function to that group, and build up a new dataframe by appending each updated df to it. This is terribly slow. It feels like I'm one appropriate step away but I cannot find that step.

CodePudding user response：

I think you need assign column inside function and then retutn group, here called df:

def ucap_speed_desired(df):

    if len(df) < 2:
        df['speed'] = 0
    else:
        df['speed'] = gps_speed(df['Latitude'], df['Longitude'], df['TimeUTC'])
    return df

t.groupby(['ID']).apply(tracklib.ucap_speed_desired).fillna({'speed':0})