Home > Net >  Aggregate count based on column identifier
Aggregate count based on column identifier

Time:10-04

Given a dataframe with this structure and with a variable with identifiers such as "q1_att_brand":

id q1_1_1  q1_1_2  q1_2_1  q1_2_2
1  1       1       1       1  
2          1       1       1
3  1                       1
4  1               1       1
5  1               1       1

How can I get the combination of values in another df such as:

      brand1  brand2
att1  4       2
att2  4       5

Basically getting the combinations of atts and brands.

Thx.

CodePudding user response:

You can turn your columns into a multi-index, then use stack and sum:

# Do this step if "id" is not already the index
# df = df.set_index('id')

df.columns = pd.MultiIndex.from_tuples(
    (f'att{a}', f'brand{b}') for _, a, b in df.columns.str.split('_'))

df.stack(level=0).sum(level=1)
       brand1  brand2
attr     4.0     2.0
attr     4.0     5.0

Note from comments:

Seems sum(level=1) is getting FutureWarning and the warning suggests to use groupby(level=1).sum() instead.

  • Related