I have a dataframe with three columns:
- Colour
- Person
- Number of times worn
There are three colours, multiple names and the number column indicates how many times a specific name had a particular colour. The problem is, the same colour occurs multiple times for the same person I am trying to do a groupby, which sums up the total number, per colour per name. Any idea how I can perform a groupby which aggregates in this manner? Sorry if this it too vague!
I attach an image of the sample data for clarity.
Any help on how to neatly aggregate by colour would be great!
Colour Person Number of times worn
0 Red Tom 1
1 Red Tom 2
2 Red Tom 5
3 Blue Tom 7
4 Blue Tom 8
5 Green Tom 9
6 Red John 9
7 Red John 6
8 Green John 0
9 Green John 0
10 Orange John 5
11 Red John 4
12 Red Stanley 2
13 Orange Stanley 4
14 Green Stanley 5
15 Green Stanley 0
16 Green Stanley 6
17 Green Stanley 7
Thanks
CodePudding user response:
You can also write in this way
df.groupby(["Person", "Color"])["n"].sum().reset_index(drop=True)
Or this works like a charm as well
df.groupby(["Person", "Color"]).agg({"n": "sum"}).reset_index(drop=True)
Only use reset_index(drop=True)
if you plan to modify the original dataframe, otherwise don't pass drop=True
and just store it a variable.
CodePudding user response:
You can groupby multiple columns at the same time like this.
df = pd.DataFrame({
'Colour' : ['red', 'red', 'red', 'red', 'blue','blue',],
'Person' : ['Tom', 'Tom', 'Tom', 'John', 'John', 'John'],
'n' : [1,2,4,5,6,7]
})
df.groupby(['Person','Colour']).sum().reset_index()
Output:
Person Colour n
0 John blue 13
1 John red 5
2 Tom red 7