Home > Software design >  How to calculate totals of all possible combinations of columns
How to calculate totals of all possible combinations of columns

Time:09-28

I have the following df:

df = pd.DataFrame({'a': [1,2,3,4,2], 'b': [3,4,1,0,4], 'c':[1,2,3,1,0], 'd':[3,2,4,1,4]})

enter image description here

I want to generate a combination of totals from these 4 columns, which equals 4 x 3 x 2 = 24 total combinations minus duplicates. I want the results in the same df.

I want something that looks like this (partial results shown):

enter image description here

A combo of a_b is the same as b_a and therefore I wouldn't want such a calculation since its a duplicate.

Is there a way to calculate all combinations and exclude duplicate totals?

CodePudding user response:

import itertools as it

orig_cols = df.columns
for r in range(2, df.shape[1]   1):
    for cols in it.combinations(orig_cols, r):
        df["_".join(cols)] = df.loc[:, cols].sum(axis=1)

Needs some looping, but not on the dataframe itself, but rather the combinations. We get 2, 3, ..., N-1'th combinations of the column names where N is number of columns. Then form the new _-joined column as the sum.

In [11]: df
Out[11]:
   a  b  c  d  a_b  a_c  a_d  b_c  b_d  c_d  a_b_c  a_b_d  a_c_d  b_c_d  a_b_c_d
0  1  3  1  3    4    2    4    4    6    4      5      7      5      7        8
1  2  4  2  2    6    4    4    6    6    4      8      8      6      8       10
2  3  1  3  4    4    6    7    4    5    7      7      8     10      8       11
3  4  0  1  1    4    5    5    1    1    2      5      5      6      2        6
4  2  4  0  4    6    2    6    4    8    4      6     10      6      8       10
  • Related