I am trying to group and summarise a pandas dataframe into a single column
ID | LayerName | Name | Count |
---|---|---|---|
A | SC | B | 2 |
A | SC | R | 8 |
A | BLD | S | 7 |
A | BLD | K | 6 |
I will like the resulting table to be summarised by the LayerName, Name and Count into a single output field like thi
ID | Output |
---|---|
A | 10 - SC : (B,R) ; 13 - BLD : (S,K) |
CodePudding user response:
You need a double groupby.agg
:
(df.groupby(['ID', 'LayerName'],
as_index=False, sort=False)
.agg({'Name': ','.join, 'Count': 'sum'})
.assign(Output=lambda d: d['Count'].astype(str)
' - ' d['LayerName']
' : (' d['Name'] ')')
.groupby('ID', as_index=False, sort=False)
.agg({'Output': ' ; '.join})
)
Output:
ID Output
0 A 10 - SC : (B,R) ; 13 - BLD : (S,K)
CodePudding user response:
df.groupby(["ID", "LayerName"], sort=False).\
apply(lambda x: f"{x.Count.sum()} - {x.LayerName.iloc[0]}: ({','.join(x.Name.to_list())})").\
str.cat(sep="; ")
# '10 - SC: (B,R); 13 - BLD: (S,K)'