I have a dataframe as below:
df = pd.DataFrame({'id': ['a', 'b', 'c', 'd'],
'colA': [1, 2, 3, 4],
'colB': [5, 6, 7, 8],
'colC': [9, 10, 11, 12],
'colD': [13, 14, 15, 16]})
I want to get all combinations of 'colA', 'colB', 'colC' and 'colD' and calculate sum for each combination. I can get all combinations using itertools
cols = ['colA', 'colB', 'colC', 'colD']
all_combinations = [c for i in range(2, len(cols) 1) for c in combinations(cols, i)]
But how can I get the sum for each combination and create a new column in the dataframe? Expected output:
id colA colB colC colD colA colB colB colC ... colA colB colC colD
a 1 5 9 13 6 14 ... 28
b 2 6 10 14 8 16 ... 32
c 3 7 11 15 10 18 ... 36
d 4 8 12 16 12 20 ... 40
CodePudding user response:
First, select from the frame a list of all columns starting with col
. Then we create a dictionary using combinations
, where the keys are the names of the new summing columns, and the values are the sums of the corresponding columns of the original dataframe, then we unpack them **
as arguments to the assign
method, thereby adding to the frame
cols = [c for c in df.columns if c.startswith('col')]
df = df.assign(**{' '.join(c):df.loc[:, c].sum(axis=1) for i in range(2, len(cols) 1) for c in combinations(cols, i)})
print(df)
id colA colB colC colD colA colB colA colC colA colD colB colC colB colD colC colD colA colB colC colA colB colD colA colC colD colB colC colD colA colB colC colD
0 a 1 5 9 13 6 10 14 14 18 22 15 19 23 27 28
1 b 2 6 10 14 8 12 16 16 20 24 18 22 26 30 32
2 c 3 7 11 15 10 14 18 18 22 26 21 25 29 33 36
3 d 4 8 12 16 12 16 20 20 24 28 24 28 32 36 40