Given a dataframe with this structure and with a variable with identifiers such as "q1_att_brand":
id q1_1_1 q1_1_2 q1_2_1 q1_2_2
1 1 1 1 1
2 1 1 1
3 1 1
4 1 1 1
5 1 1 1
How can I get the combination of values in another df such as:
brand1 brand2
att1 4 2
att2 4 5
Basically getting the combinations of atts and brands.
Thx.
CodePudding user response:
You can turn your columns into a multi-index, then use stack and sum:
# Do this step if "id" is not already the index
# df = df.set_index('id')
df.columns = pd.MultiIndex.from_tuples(
(f'att{a}', f'brand{b}') for _, a, b in df.columns.str.split('_'))
df.stack(level=0).sum(level=1)
brand1 brand2
attr 4.0 2.0
attr 4.0 5.0
Note from comments:
Seems
sum(level=1)
is getting FutureWarning and the warning suggests to usegroupby(level=1).sum()
instead.