I'm trying to make the kind of transformation shown in the image below :
I made the code below but unfortunately I'm not getting the result I'm looking for:
import pandas as pd
df = pd.DataFrame({'Id': ['Id001', 'Id002', 'Id002', 'Id003', 'Id003', 'Id003', 'Id004', 'Id004'],
'Values': ['red', 'brown','white','blue', 'green', 'yellow', 'rose', 'purple']})
out = (df['Values']
.astype(str)
.groupby(df['Id'])
.agg('|'.join)
.reset_index())
Do you have any suggestions/propositions, please ?
CodePudding user response:
You're close, you just need to use out
to assign the result back to the df (it's better if you don't reset_index() in this case):
import pandas as pd
df = pd.DataFrame({'Id': ['Id001', 'Id002', 'Id002', 'Id003', 'Id003', 'Id003', 'Id004', 'Id004'],
'Values': ['red', 'brown','white','blue', 'green', 'yellow', 'rose', 'purple']})
out = (df['Values']
.astype(str)
.groupby(df['Id'])
.agg('|'.join))
counts = df['Id'].value_counts()
df['Id_occurrences'] = [counts.loc[id] for id in df['Id']]
df['Values_grouped'] = [out.loc[id] for id in df['Id']]
CodePudding user response:
Try transform
df['Values_grouped'] = df.groupby('Id')['Values'].transform('|'.join)
df['Id_occurrences'] = df.groupby('Id')['Values'].transform('count')