I have a dataframe with columns Value
, Type
and Subtype
. I would like to first group it by Type
and then group it by Subtype
but the second groupby should be done only for a specific group (i.e. for example only for a group Type=="Type 2
). How can I do that in the most elegant way?
This is my code - I would like to do the second grouping (by SubType
) only for Type 2
, not both of them. For Type 1
the aggregated value should be equal to the sum from all SubTypes
.
import pandas as pd
df = pd.DataFrame()
df["Type"] = ["1", "2", "1", "2", "1", "2"]
df["SubType"] = ["A", "A","B", "B","C", "C"]
df["Value"] = [1,2,3,4,5,6]
gb = df.groupby(["Type", "SubType"]).agg({"Value": sum})
gb
Value
Type SubType
1 A 1
B 3
C 5
2 A 2
B 4
C 6
CodePudding user response:
IIUC, you can transform "A, B, C" SubType
of Type
2 in a new SubType
like "X":
out = (df.assign(SubType=df['SubType'].mask(df['Type'] == '2', 'X'))
.groupby(['Type', 'SubType'], as_index=False)['Value'].sum())
print(out)
# Output
Type SubType Value
0 1 A 1
1 1 B 3
2 1 C 5
3 2 X 12