I have this problem:
A
ID | Data |
---|---|
1 | Something |
2 | Anything |
B
ID | A_ID | Data |
---|---|---|
1 | 1 | data1 |
2 | 1 | data2 |
3 | 2 | data3 |
A B
ID | Data | B_ID_1 | B_Data | B_ID_2 | B_Data_2 |
---|---|---|---|---|---|
1 | Something | 1 | data1 | 2 | data2 |
2 | Anything | 3 | data3 | NaN | NaN |
So basically, I want to join two tables, but without duplicating rows and creating new columns.
CodePudding user response:
Use GroupBy.cumcount
for groups, pass to DataFrame.pivot
and sorting MultiIndex in columns
, flatten it and last join to A
by DataFrame.join
:
B['g'] = B.groupby('A_ID').cumcount().add(1)
B = B.pivot('A_ID','g').sort_index(axis=1, level=1, sort_remaining=False)
B.columns = B.columns.map(lambda x: f'{x[0]}_{x[1]}')
print (B)
ID_1 Data_1 ID_2 Data_2
A_ID
1 1.0 data1 2.0 data2
2 3.0 data3 NaN NaN
df = A.join(B, on='ID')
print (df)
ID Data ID_1 Data_1 ID_2 Data_2
0 1 Something 1.0 data1 2.0 data2
1 2 Anything 3.0 data3 NaN NaN