Please find below my input/output :
INPUT :
Id Values Status
0 Id001 red online
1 Id002 brown running
2 Id002 white off
3 Id003 blue online
4 Id003 green valid
5 Id003 yellow running
6 Id004 rose off
7 Id004 purple off
OUTPUT :
Id Values Status Id_occ Val_occ Sta_occ
0 Id001 red online 1 1 1
1 Id002 brown|white running|off 2 2 2
2 Id003 blue|green|yellow online|valid|running 3 3 3
3 Id004 rose|purple off 2 2 1
I was able to re-calculate the columns Values
and Status
but I don't know how to create the three columns of occurences.
import pandas as pd
df = pd.DataFrame({'Id': ['Id001', 'Id002', 'Id002', 'Id003', 'Id003', 'Id003', 'Id004', 'Id004'],
'Values': ['red', 'brown','white','blue', 'green', 'yellow', 'rose', 'purple'],
'Status': ['online', 'running', 'off', 'online', 'valid', 'running', 'off', 'off']})
out = (df.groupby(['Id'])
.agg({'Values': 'unique', 'Status': 'unique'})
.applymap(lambda x: '|'.join([str(val) for val in list(x)]))
.reset_index()
)
Do you have any suggestions of how to create the three columns of occurences ? Also, is there a better way to re-calculate the columns Values
and Status
?
CodePudding user response:
You can use named aggregation and a custom function:
ujoin = lambda s: '|'.join(dict.fromkeys(s))
out = (df
.assign(N_off=df['Status'].eq('off'))
.groupby(['Id'], as_index=False)
.agg(**{'Values': ('Values', ujoin),
'Status': ('Status', ujoin),
'Id_occ': ('Values', 'size'),
'Val_occ': ('Values', 'nunique'),
'Stat_occ': ('Status', 'nunique'),
'N_off': ('N_off', 'sum')
})
)
Output:
Id Values Status Id_occ Val_occ Stat_occ N_off
0 Id001 red online 1 1 1 0
1 Id002 brown|white running|off 2 2 2 1
2 Id003 blue|green|yellow online|valid|running 3 3 3 0
3 Id004 rose|purple off 2 2 1 2
CodePudding user response:
Use:
df.groupby('Id')['Values'].nunique()
for two columns:
df.groupby('Id')[['Values', 'Status']].nunique()
Output:
Values Status
Id
Id001 1 1
Id002 2 2
Id003 3 3
Id004 2 1