Home > Mobile >  How to aggregate only non duplicates values using Pandas
How to aggregate only non duplicates values using Pandas

Time:08-30

I have the dataframe below:

    ID  COL1    COL2
0   id001   val1    xxxxx
1   id001   val1    yyyyy
2   id002   val2    yyyyy
3   id003   val3    zzzzz
4   id003   val4    zzzzz

And this is the expected output:

ID  COL1    COL2
0   id001   val1    xxxxx|yyyyy
1   id002   val2    yyyyy
2   id003   val3|val4   zzzzz

I made the code below but unfortunately val1 (first row) and zzzzz (last row) are repeated.

df_gr = df[['COL1', 'COL2']].astype(str).groupby(df['ID']).agg('|'.join).reset_index()

    ID  COL1    COL2
0   id001   val1|val1   xxxxx|yyyyy
1   id002   val2    yyyyy
2   id003   val3|val4   zzzzz|zzzzz

Do you know how to fix this, please ?

CodePudding user response:

Note that I needed to convert an array to a list with string elements before applying join:

df.groupby(['ID']).agg({'COL1': 'unique', 'COL2':'unique'}).applymap(lambda x: '|'.join(map(str, x))).reset_index()

CodePudding user response:

Try this

df[['COL1', 'COL2']].astype(str).groupby(['ID','COL1']).agg('|'.join).reset_index()

This will limit you groupby and formulate a single series, after that you just need to append the series into your df

  • Related