Home > Enterprise >  Get all combinations of several columns in a pandas dataframe and calculate sum for each combination
Get all combinations of several columns in a pandas dataframe and calculate sum for each combination

Time:12-02

I have a dataframe as below:

df = pd.DataFrame({'id': ['a', 'b', 'c', 'd'], 
                   'colA': [1, 2, 3, 4], 
                   'colB': [5, 6, 7, 8], 
                   'colC': [9, 10, 11, 12], 
                   'colD': [13, 14, 15, 16]})

I want to get all combinations of 'colA', 'colB', 'colC' and 'colD' and calculate sum for each combination. I can get all combinations using itertools

cols = ['colA', 'colB', 'colC', 'colD']
all_combinations = [c for i in range(2, len(cols) 1) for c in combinations(cols, i)]

But how can I get the sum for each combination and create a new column in the dataframe? Expected output:

id  colA  colB  colC  colD  colA colB  colB colC ... colA colB colC colD
a   1     5     9     13    6          14        ... 28
b   2     6     10    14    8          16        ... 32
c   3     7     11    15    10         18        ... 36
d   4     8     12    16    12         20        ... 40

CodePudding user response:

First, select from the frame a list of all columns starting with col. Then we create a dictionary using combinations, where the keys are the names of the new summing columns, and the values are the sums of the corresponding columns of the original dataframe, then we unpack them ** as arguments to the assign method, thereby adding to the frame

cols = [c for c in df.columns if c.startswith('col')]
df = df.assign(**{' '.join(c):df.loc[:, c].sum(axis=1) for i in range(2, len(cols)   1) for c in combinations(cols, i)})
print(df)
  id  colA  colB  colC  colD  colA colB  colA colC  colA colD  colB colC  colB colD  colC colD  colA colB colC  colA colB colD  colA colC colD  colB colC colD  colA colB colC colD
0  a     1     5     9    13          6         10         14         14         18         22              15              19              23              27                   28
1  b     2     6    10    14          8         12         16         16         20         24              18              22              26              30                   32
2  c     3     7    11    15         10         14         18         18         22         26              21              25              29              33                   36
3  d     4     8    12    16         12         16         20         20         24         28              24              28              32              36                   40
  • Related