Home > Back-end >  How to merge 2 rows into 1 row with 2 different columns?
How to merge 2 rows into 1 row with 2 different columns?

Time:06-08

I would like to merge 2 or 3 row into 1 row like below.

Original chart:

CaseNr AccType Vehicle Model VehicleMass
A 1 PassengerCar A1 1217
A 1 Train Train 99999
B 2 PassengerCar B7 1400
B 2 Train Train 99999
C 3 PassengerCar C2 1295

Modified chart that I want:

CaseNr AccType Vehicle_1 Vehicle_2 Model_1 Model_2 VehicleMass_1 Vehicle Mass_2 MainFac
A 1 PassengerCar Train A1 Train 1217 99999 5
B 2 PassengerCar Train B7 Train 1400 99999 6
C 3 PassengerCar NaN C2 NaN 1295 NaN 2

As you could notice the chart, the data field include vehicle crash information. Therefore, 1, 2, or 3 vehicles could be involved in one accident case.

I searched all merge options but I could not find how to do this.

CodePudding user response:

Use pivot to reshape your data. This is called reshaping to wide and not merging.

df1 = df.assign(name = df.groupby('CaseNr').cumcount()   1).pivot(['CaseNr', 'AccType'], 'name')
df1.columns = df1.columns.map(lambda x:f'{x[0]}_{x[1]}')
df1.reset_index()

  CaseNr  AccType     Vehicle_1  ... Model_2 VehicleMass_1 VehicleMass_2
0      A        1  PassengerCar  ...   Train        1217.0       99999.0
1      B        2  PassengerCar  ...   Train        1400.0       99999.0
2      C        3  PassengerCar  ...     NaN        1295.0           NaN

If you have janitor installed you can do:

import janitor
df.assign(name = df.groupby('CaseNr').cumcount()   1).pivot_wider(['CaseNr', 'AccType'], 'name')

  CaseNr  AccType     Vehicle_1  ... Model_2 VehicleMass_1 VehicleMass_2
0      A        1  PassengerCar  ...   Train        1217.0       99999.0
1      B        2  PassengerCar  ...   Train        1400.0       99999.0
2      C        3  PassengerCar  ...     NaN        1295.0           NaN

CodePudding user response:

using cumcount and .unstack

s = df.groupby(["CaseNr", "AccType"]).cumcount()

df.set_index([s   1, "CaseNr", "AccType"]).unstack(0)

# if you must flatten your multi index - better to have it imo.

df1.columns = [f"{x}_{y}" for x, y in df1.columns.to_flat_index()]


print(df1)

                   Vehicle_1 Vehicle_2 Model_1 Model_2  VehicleMass_1  VehicleMass_2
CaseNr AccType
A      1        PassengerCar     Train      A1   Train         1217.0        99999.0
B      2        PassengerCar     Train      B7   Train         1400.0        99999.0
C      3        PassengerCar       NaN      C2     NaN         1295.0            NaN
  • Related