Home > Mobile >  Pandas left join, with duplicated columns
Pandas left join, with duplicated columns

Time:11-05

I have this problem:

A

ID Data
1 Something
2 Anything

B

ID A_ID Data
1 1 data1
2 1 data2
3 2 data3

A B

ID Data B_ID_1 B_Data B_ID_2 B_Data_2
1 Something 1 data1 2 data2
2 Anything 3 data3 NaN NaN

So basically, I want to join two tables, but without duplicating rows and creating new columns.

CodePudding user response:

Use GroupBy.cumcount for groups, pass to DataFrame.pivot and sorting MultiIndex in columns, flatten it and last join to A by DataFrame.join:

B['g'] = B.groupby('A_ID').cumcount().add(1)

B = B.pivot('A_ID','g').sort_index(axis=1, level=1, sort_remaining=False)
B.columns = B.columns.map(lambda x: f'{x[0]}_{x[1]}')
print (B)
      ID_1 Data_1  ID_2 Data_2
A_ID                          
1      1.0  data1   2.0  data2
2      3.0  data3   NaN    NaN

df = A.join(B, on='ID')
print (df)
   ID       Data  ID_1 Data_1  ID_2 Data_2
0   1  Something   1.0  data1   2.0  data2
1   2   Anything   3.0  data3   NaN    NaN
  • Related