Below is the data set
I want to merge the data set based on " UUID" to get one row with start and end time as first of start time and end of end time All other numeric row to be summed up
CodePudding user response:
Dataframe:
import pandas as pd
df=pd.DataFrame({
'trip_creation_time':['37:08.9', '37:08.9', '37:08.9'],
'UUID':['ABC-1', 'ABC-1', 'ABC-1'],
'RouteType':['C', 'C', 'C'],
'tripid':['trip-15', 'trip-15', 'trip-15'],
'sourcecenter':['AAC', 'AAC', 'AAC'],
'source_name': ['H_TS_1', 'H_TS_2', 'H_TS_3'],
'destination_center': ['AAC', 'AAC', 'AAA'],
'destination_name': ['M-TS_1', 'M-TS_2', 'M-TS_3'],
'starttime':['05:52.6', '05:52.6', '05:34.8'],
'endtime':['05:34.8', '05:34.8', '35:54.0'],
'endtoend_time': [59, 59, 210],
'actualdistance': [9.684842238, 14.85920056, 9.499885115],
'actualtime':[14, 30, 165],
'otime':[8, 12, 14],
'distance':[10.3638, 16.0049, 14.4597]
})
Group By UUID and aggregate the columns by passing the function you want to run on each column
first of starttime and last of endtime and summing up column distance and otime
df.groupby('UUID').agg(starttime_first=('starttime','first'),
endtime_last=('starttime','last'),
distance_sum = ('distance', 'sum'),
otime_sum = ('otime', 'sum'))