Home > OS >  SUMPRODUCT Specific columns in Pandas DataFrame
SUMPRODUCT Specific columns in Pandas DataFrame

Time:05-31

From the below mentioned data frame, I am trying to calculate excel type SUMPRODUCT of columns V1, V2 and V3 against columns S1, S2 and S3.

df = pd.DataFrame({'Name': ['A', 'B', 'C'],
                   'Qty': [100, 150, 200],
                   'Remarks': ['Bad', 'Avg', 'Good'],
                   'V1': [0,1,1],
                   'V2': [1,1,0],
                   'V3': [0,0,1],
                   'S1': [1,0,1],
                   'S2': [0,1,0],
                   'S3': [1,0,1]
            })

I am looking a way to do this without having to use each column's name like:

df['SP'] = df[['V1', 'S1']].prod(axis=1)   df[['V2', 'S2']].prod(axis=1)   df[['V3', 'S3']].prod(axis=1)

In my real data frame, I have more than 50 columns in both 'V' and 'S' categories so the above approach is not possible.

Any suggestions?

Thanks!

CodePudding user response:

Filter the S and V like columns then multiply the S columns with the corresponding V columns and sum the result along columns axis

s = df.filter(regex='S\d ')
p = df.filter(regex='V\d ')

df['SP'] = s.mul(p.values).sum(1)

  Name  Qty Remarks  V1  V2  V3  S1  S2  S3  SP
0    A  100     Bad   0   1   0   1   0   1   0
1    B  150     Avg   1   1   0   0   1   0   1
2    C  200    Good   1   0   1   1   0   1   2

PS: This solution assumes that the order of appearance of S and V columns in the original dataframe matches.

CodePudding user response:

You could try something like this:

# need to edit these two lines to work with your larger DataFrame
v_cols = df.columns[3:6]  # ['V1', 'V2', 'V3']
s_cols = df.columns[6:]  # ['S1', 'S2', 'S3']

df['SP'] = (df[v_cols].to_numpy() * df[s_cols].to_numpy()).sum(axis=1)

Edited with an alternative after seeing comment from @ALollz about MultiIndex making alignment simpler:

df.set_index(['Name', 'Qty', 'Remarks'], inplace=True)
n_cols = df.shape[1] // 2
v_cols = df.columns[:n_cols]
s_cols = df.columns[n_cols:]
df['SP'] = (df[v_cols].to_numpy() * df[s_cols].to_numpy()).sum(axis=1)

You can then reset index if you prefer:

df.reset_index(inplace=True)

Results:

  Name  Qty Remarks  V1  V2  V3  S1  S2  S3  SP
0    A  100     Bad   0   1   0   1   0   1   0
1    B  150     Avg   1   1   0   0   1   0   1
2    C  200    Good   1   0   1   1   0   1   2

CodePudding user response:

If your Vn and Sn in columns are in order

v_cols = df.filter(like='V').columns
s_cols = df.filter(like='S').columns

df['SP2'] = sum([df[[v, s]].prod(axis=1) for v, s in zip(v_cols, s_cols)])
print(df)

  Name  Qty Remarks  V1  V2  V3  S1  S2  S3  SP  SP2
0    A  100     Bad   0   1   0   1   0   1   0    0
1    B  150     Avg   1   1   0   0   1   0   1    1
2    C  200    Good   1   0   1   1   0   1   2    2
  • Related