I have the following df:
df = pd.DataFrame({'a': [1,2,3,4,2], 'b': [3,4,1,0,4], 'c':[1,2,3,1,0], 'd':[3,2,4,1,4]})
I want to generate a combination of totals from these 4 columns, which equals 4 x 3 x 2 = 24 total combinations minus duplicates. I want the results in the same df.
I want something that looks like this (partial results shown):
A combo of a_b is the same as b_a and therefore I wouldn't want such a calculation since its a duplicate.
Is there a way to calculate all combinations and exclude duplicate totals?
CodePudding user response:
import itertools as it
orig_cols = df.columns
for r in range(2, df.shape[1] 1):
for cols in it.combinations(orig_cols, r):
df["_".join(cols)] = df.loc[:, cols].sum(axis=1)
Needs some looping, but not on the dataframe itself, but rather the combinations. We get 2, 3, ..., N-1'th combinations of the column names where N is number of columns. Then form the new _-joined column as the sum.
In [11]: df
Out[11]:
a b c d a_b a_c a_d b_c b_d c_d a_b_c a_b_d a_c_d b_c_d a_b_c_d
0 1 3 1 3 4 2 4 4 6 4 5 7 5 7 8
1 2 4 2 2 6 4 4 6 6 4 8 8 6 8 10
2 3 1 3 4 4 6 7 4 5 7 7 8 10 8 11
3 4 0 1 1 4 5 5 1 1 2 5 5 6 2 6
4 2 4 0 4 6 2 6 4 8 4 6 10 6 8 10