Home > Net >  Concatenating values into column from multiple rows
Concatenating values into column from multiple rows

Time:08-14

I have a dataframe containing only duplicate "MainID" rows.
One MainID may have multiple secondary IDs (SecID).
I want to concatenate the values of SecID if there is a common MainID, joined by ':' in SecID col.
What is the best way of achieving this?
Yes, I know this is not best practice, however it's the structure the software wants. Need to keep the df structure and values in rest of the df. They will always match the other duplicated row. Only SecID will be different.

Current:

data={'MainID':['NHFPL0580','NHFPL0580','NHFPL0582','NHFPL0582'],'SecID':['G12345','G67890','G11223','G34455'], 'Other':['A','A','B','B']}
df=pd.DataFrame(data)
print(df)
      MainID   SecID  Other
0  NHFPL0580  G12345  A
1  NHFPL0580  G67890  A
2  NHFPL0582  G11223  B
3  NHFPL0582  G34455  B

Intended Structure

MainID     SecID          Other 
NHFPL0580  G12345:G67890  A
NHFPL0582  G11223:G34455  B

CodePudding user response:

Try:

df.groupby('MainID').apply(lambda x: ':'.join(x.SecID))

the above code returns a pd.Series, and you can convert it to a dataframe as @Guy suggested:

You need .reset_index(name='SecID') if you want it back as DataFrame

The solution to the edited question:

df = df.groupby(['MainID', 'Other']).apply(lambda x: ':'.join(x.SecID)).reset_index(name='SecID')

You can then change the column order

cols = df.columns.tolist()
df = df[[cols[i] for i in [0, 2, 1]]]
  • Related