I have the following table
Path ID | Lane ID | Customer | Source | Destination | Mode |
---|---|---|---|---|---|
1 | 1 | Mumbai | Chicago | Berlin | Ship |
1 | 2 | Mumbai | Berlin | Mumbai | Air |
2 | 1 | Mumbai | Chicago | Berlin | Air |
2 | 2 | Mumbai | Berlin | Dubai | Air |
2 | 3 | Mumbai | Dubai | Mumbai | Ship |
I want the following table
Path ID | Source | Site2 | Site3 | Destination | Lane1 Mode | Lane2 Mode | Lane3 Mode |
---|---|---|---|---|---|---|---|
1 | Chicago | Berlin | Mumbai | Ship | Air | ||
2 | Chicago | Berlin | Dubai | Mumbai | Air | Air | Ship |
How do I go about getting this table? I feel like groupby
is obviously required but what after that? Not sure how to proceed from there. The dataset is really big so it also needs to be efficient. Any pointers would help :)
CodePudding user response:
First drop
irrelevant columns. Then as @LiamRoberts notes, pivot
will get us in the right shape. Then fix the "Destination" column with stack
groupby
last
. Finally, fix the column names.
s = df.drop(columns='Customer').pivot(index=['Path ID'], columns=['Lane ID'])
s['Dest'] = s['Destination'].stack().groupby(level=0).last()
s = s.drop(columns='Destination').reset_index()
s.columns = [f'{c1}{c2}' for c1, c2 in s.columns]
s = (s.rename(columns={'Source1':'Source', 'Source2':'Site2',
'Source3':'Site3', 'Dest':'Destination',
'Mode1':'Lane1 Mode', 'Mode2':'Lane2 Mode',
'Mode3':'Lane3 Mode'})[['Path ID','Source','Site2','Site3','Destination',
'Lane1 Mode','Lane2 Mode','Lane3 Mode']])
Output:
Path ID Source Site2 Site3 Destination Lane1 Mode Lane2 Mode Lane3 Mode
0 1 Chicago Berlin NaN Mumbai Ship Air NaN
1 2 Chicago Berlin Dubai Mumbai Air Air Ship
CodePudding user response:
It looks like you're actually pivoting the Lane Id and indexing with the Path Id. This should get you most of the way.
df.pivot(index="Path ID", columns=["Lane ID"])