I am trying merge data from multiple rows into a single row with multiple columns. I can only manage one column to work when I want multiple.
Given below is how the data looks like :
column1 column2 column3
1 a Y
1 b Z
2 d Z
2 c Y
Expected output:
column1 column2 column3
1 a, b Y, Z
2 d, c Z, Y
Using df.groupby('column1')['column2'].apply(', '.join).reset_index()
I've got the output:
column1 column2
1 a, b
2 d, c
But unsure the syntax for adding column3.
df.groupby('column1')['column2', 'column3'].apply(', '.join).reset_index()
doesn't work.
Neither does df.groupby('column1')[['column2', 'column3']].apply(', '.join).reset_index()
.
CodePudding user response:
You can try agg
for multiple concatenations:
df.groupby('column1').agg(lambda x: ', '.join(x)).reset_index()