Home > Enterprise >  Sum of a list of columns for which the associated dummy equals 1
Sum of a list of columns for which the associated dummy equals 1

Time:10-05

I have a long list of columns with numerical values. Each column has an associated dummy variable. I want to create a new column with row-wise sums of all columns in my list for that the associated dummy equals 1.
Because of the length of the data, specifying conditions for each column individually is not feasible.

Example data:

df = pd.DataFrame(np.random.randint(0,100,size=(4, 3)), columns=('A', 'B', 'C'))
dummy_array = np.array([[1, 0, 0],[1, 1, 1], [0, 0, 0], [0, 1, 1]])
df[['A_dummy', 'B_dummy', 'C_dummy']] = pd.DataFrame(index=df.index,data = dummy_array)
print(df) 

   A   B   C   A_dummy  B_dummy  C_dummy
0  33  56  75        1        0        0
1  12  99  49        1        1        1
2   7  92  84        0        0        0
3  95  19  79        0        1        1 

desired solution:

   A   B   C   A_dummy  B_dummy  C_dummy  ABC_sum
0  33  56  75        1        0        0       33
1  12  99  49        1        1        1      160
2   7  92  84        0        0        0        0 (or NaN)
3  95  19  79        0        1        1       98

I fail to come of up for a solution that would go through a list = ['A', 'B', 'C'] and do this. Thanks for the input.

The (embarassingly easy) solution for this, provided by @mozway:

abclist = ['A', 'B', 'C']       # I used a regex-filter to create this list
abc_dummylist = [item   '_dummy' for item in abclist]
df['ABC_sum'] = (df[abclist ]*df[abc_dummylist ].values).sum(axis=1)
print(df)

    A   B   C  A_dummy  B_dummy  C_dummy  ABC_sum
0  70  24  10        1        0        0       70
1   7  15   3        1        1        1       25
2  25  74  23        0        0        0        0
3  22  72  77        0        1        1      149

CodePudding user response:

You could do:

df['ABC_sum'] = (df[['A', 'B', 'C']]*df[['A_dummy', 'B_dummy', 'C_dummy']].values).sum(axis=1)

NB. if you have many columns, you could select them with filter

example:

    A   B   C  A_dummy  B_dummy  C_dummy  ABC_sum
0  92  12  74        1        0        0       92
1  40  59  36        1        1        1      135
2  88  70  40        0        0        0        0
3  78  49  53        0        1        1      102
  • Related