Home > Blockchain >  Python nested groupby
Python nested groupby

Time:06-12

I have a dataframe with columns Value, Type and Subtype. I would like to first group it by Type and then group it by Subtype but the second groupby should be done only for a specific group (i.e. for example only for a group Type=="Type 2). How can I do that in the most elegant way? This is my code - I would like to do the second grouping (by SubType) only for Type 2, not both of them. For Type 1 the aggregated value should be equal to the sum from all SubTypes.

import pandas as pd
df = pd.DataFrame()

df["Type"] = ["1", "2", "1", "2", "1", "2"]
df["SubType"] = ["A", "A","B", "B","C", "C"]
df["Value"] = [1,2,3,4,5,6]

gb = df.groupby(["Type", "SubType"]).agg({"Value": sum})
gb

        Value
Type    SubType 
1   A   1
B   3
C   5
2   A   2
B   4
C   6

CodePudding user response:

IIUC, you can transform "A, B, C" SubType of Type 2 in a new SubType like "X":

out = (df.assign(SubType=df['SubType'].mask(df['Type'] == '2', 'X'))
         .groupby(['Type', 'SubType'], as_index=False)['Value'].sum())
print(out)

# Output
  Type SubType  Value
0    1       A      1
1    1       B      3
2    1       C      5
3    2       X     12
  • Related