Home > Blockchain >  Group by pandas to get path from source to end destination
Group by pandas to get path from source to end destination

Time:02-18

I have the following table

Path ID Lane ID Customer Source Destination Mode
1 1 Mumbai Chicago Berlin Ship
1 2 Mumbai Berlin Mumbai Air
2 1 Mumbai Chicago Berlin Air
2 2 Mumbai Berlin Dubai Air
2 3 Mumbai Dubai Mumbai Ship

I want the following table

Path ID Source Site2 Site3 Destination Lane1 Mode Lane2 Mode Lane3 Mode
1 Chicago Berlin Mumbai Ship Air
2 Chicago Berlin Dubai Mumbai Air Air Ship

How do I go about getting this table? I feel like groupby is obviously required but what after that? Not sure how to proceed from there. The dataset is really big so it also needs to be efficient. Any pointers would help :)

CodePudding user response:

First drop irrelevant columns. Then as @LiamRoberts notes, pivot will get us in the right shape. Then fix the "Destination" column with stack groupby last. Finally, fix the column names.

s = df.drop(columns='Customer').pivot(index=['Path ID'], columns=['Lane ID'])
s['Dest'] = s['Destination'].stack().groupby(level=0).last()
s = s.drop(columns='Destination').reset_index()
s.columns = [f'{c1}{c2}' for c1, c2 in s.columns]
s = (s.rename(columns={'Source1':'Source', 'Source2':'Site2', 
                       'Source3':'Site3', 'Dest':'Destination',
                      'Mode1':'Lane1 Mode', 'Mode2':'Lane2 Mode', 
                       'Mode3':'Lane3 Mode'})[['Path ID','Source','Site2','Site3','Destination',
                                               'Lane1 Mode','Lane2 Mode','Lane3 Mode']])

Output:

   Path ID   Source   Site2  Site3 Destination Lane1 Mode Lane2 Mode    Lane3 Mode  
0        1  Chicago  Berlin    NaN      Mumbai       Ship        Air           NaN  
1        2  Chicago  Berlin  Dubai      Mumbai        Air        Air          Ship  

CodePudding user response:

It looks like you're actually pivoting the Lane Id and indexing with the Path Id. This should get you most of the way.

df.pivot(index="Path ID", columns=["Lane ID"])
  • Related