Home > Enterprise >  I need help creating a groupby in Pandas which aggregates on one column
I need help creating a groupby in Pandas which aggregates on one column

Time:11-06

I have a dataframe with three columns:

  • Colour
  • Person
  • Number of times worn

There are three colours, multiple names and the number column indicates how many times a specific name had a particular colour. The problem is, the same colour occurs multiple times for the same person I am trying to do a groupby, which sums up the total number, per colour per name. Any idea how I can perform a groupby which aggregates in this manner? Sorry if this it too vague!

I attach an image of the sample data for clarity.

Any help on how to neatly aggregate by colour would be great!

    Colour  Person  Number of times worn
0   Red Tom 1
1   Red Tom 2
2   Red Tom 5
3   Blue    Tom 7
4   Blue    Tom 8
5   Green   Tom 9
6   Red John    9
7   Red John    6
8   Green   John    0
9   Green   John    0
10  Orange  John    5
11  Red John    4
12  Red Stanley 2
13  Orange  Stanley 4
14  Green   Stanley 5
15  Green   Stanley 0
16  Green   Stanley 6
17  Green   Stanley 7

Thanks

CodePudding user response:

You can also write in this way

df.groupby(["Person", "Color"])["n"].sum().reset_index(drop=True)

Or this works like a charm as well

df.groupby(["Person", "Color"]).agg({"n": "sum"}).reset_index(drop=True)

Only use reset_index(drop=True) if you plan to modify the original dataframe, otherwise don't pass drop=True and just store it a variable.

CodePudding user response:

You can groupby multiple columns at the same time like this.

df = pd.DataFrame({
    'Colour' : ['red', 'red', 'red', 'red', 'blue','blue',],
    'Person' : ['Tom', 'Tom', 'Tom', 'John', 'John', 'John'],
    'n' : [1,2,4,5,6,7]
})

df.groupby(['Person','Colour']).sum().reset_index()

Output:

    Person  Colour  n
0   John    blue    13
1   John    red     5
2   Tom     red     7

  • Related