Home > OS >  Need to aggregate data in pandas data frame
Need to aggregate data in pandas data frame

Time:08-20

Below is the data set

DataSet

I want to merge the data set based on " UUID" to get one row with start and end time as first of start time and end of end time All other numeric row to be summed up

CodePudding user response:

Dataframe:

import pandas as pd

df=pd.DataFrame({
        'trip_creation_time':['37:08.9', '37:08.9', '37:08.9'],
        'UUID':['ABC-1', 'ABC-1', 'ABC-1'],
        'RouteType':['C', 'C', 'C'],
        'tripid':['trip-15', 'trip-15', 'trip-15'],
        'sourcecenter':['AAC', 'AAC', 'AAC'],
        'source_name': ['H_TS_1', 'H_TS_2', 'H_TS_3'],
        'destination_center': ['AAC', 'AAC', 'AAA'],
        'destination_name': ['M-TS_1', 'M-TS_2', 'M-TS_3'],
        'starttime':['05:52.6', '05:52.6', '05:34.8'],
        'endtime':['05:34.8', '05:34.8', '35:54.0'],
        'endtoend_time': [59, 59, 210],
        'actualdistance': [9.684842238, 14.85920056, 9.499885115],
        'actualtime':[14, 30, 165],
        'otime':[8, 12, 14],
        'distance':[10.3638, 16.0049, 14.4597]
        
    })

Group By UUID and aggregate the columns by passing the function you want to run on each column

first of starttime and last of endtime and summing up column distance and otime

df.groupby('UUID').agg(starttime_first=('starttime','first'),
                      endtime_last=('starttime','last'),
                      distance_sum = ('distance', 'sum'),
                      otime_sum = ('otime', 'sum'))

enter image description here

  • Related