I have the dataframe below:
ID COL1 COL2
0 id001 val1 xxxxx
1 id001 val1 yyyyy
2 id002 val2 yyyyy
3 id003 val3 zzzzz
4 id003 val4 zzzzz
And this is the expected output:
ID COL1 COL2
0 id001 val1 xxxxx|yyyyy
1 id002 val2 yyyyy
2 id003 val3|val4 zzzzz
I made the code below but unfortunately val1
(first row) and zzzzz
(last row) are repeated.
df_gr = df[['COL1', 'COL2']].astype(str).groupby(df['ID']).agg('|'.join).reset_index()
ID COL1 COL2
0 id001 val1|val1 xxxxx|yyyyy
1 id002 val2 yyyyy
2 id003 val3|val4 zzzzz|zzzzz
Do you know how to fix this, please ?
CodePudding user response:
Note that I needed to convert an array to a list with string elements before applying join
:
df.groupby(['ID']).agg({'COL1': 'unique', 'COL2':'unique'}).applymap(lambda x: '|'.join(map(str, x))).reset_index()
CodePudding user response:
Try this
df[['COL1', 'COL2']].astype(str).groupby(['ID','COL1']).agg('|'.join).reset_index()
This will limit you groupby and formulate a single series, after that you just need to append the series into your df