Grouping a dataframe and conserving the same number of rows-CodePudding

I'm trying to make the kind of transformation shown in the image below :

I made the code below but unfortunately I'm not getting the result I'm looking for:

import pandas as pd

df = pd.DataFrame({'Id': ['Id001', 'Id002', 'Id002', 'Id003', 'Id003', 'Id003', 'Id004', 'Id004'],
                   'Values': ['red', 'brown','white','blue', 'green', 'yellow', 'rose', 'purple']})

out = (df['Values']
      .astype(str)
      .groupby(df['Id'])
      .agg('|'.join)
      .reset_index())

Do you have any suggestions/propositions, please ?

CodePudding user response：

You're close, you just need to use out to assign the result back to the df (it's better if you don't reset_index() in this case):

import pandas as pd

df = pd.DataFrame({'Id': ['Id001', 'Id002', 'Id002', 'Id003', 'Id003', 'Id003', 'Id004', 'Id004'],
                   'Values': ['red', 'brown','white','blue', 'green', 'yellow', 'rose', 'purple']})

out = (df['Values']
      .astype(str)
      .groupby(df['Id'])
      .agg('|'.join))

counts = df['Id'].value_counts()
df['Id_occurrences'] = [counts.loc[id] for id in df['Id']]
df['Values_grouped'] = [out.loc[id] for id in df['Id']]

CodePudding user response：

Try transform

df['Values_grouped'] = df.groupby('Id')['Values'].transform('|'.join)
df['Id_occurrences'] = df.groupby('Id')['Values'].transform('count')