From the below mentioned data frame, I am trying to calculate excel type SUMPRODUCT of columns V1, V2 and V3 against columns S1, S2 and S3.
df = pd.DataFrame({'Name': ['A', 'B', 'C'],
'Qty': [100, 150, 200],
'Remarks': ['Bad', 'Avg', 'Good'],
'V1': [0,1,1],
'V2': [1,1,0],
'V3': [0,0,1],
'S1': [1,0,1],
'S2': [0,1,0],
'S3': [1,0,1]
})
I am looking a way to do this without having to use each column's name like:
df['SP'] = df[['V1', 'S1']].prod(axis=1) df[['V2', 'S2']].prod(axis=1) df[['V3', 'S3']].prod(axis=1)
In my real data frame, I have more than 50 columns in both 'V' and 'S' categories so the above approach is not possible.
Any suggestions?
Thanks!
CodePudding user response:
Filter the S and V like columns then multiply the S columns with the corresponding V columns and sum the result along columns axis
s = df.filter(regex='S\d ')
p = df.filter(regex='V\d ')
df['SP'] = s.mul(p.values).sum(1)
Name Qty Remarks V1 V2 V3 S1 S2 S3 SP
0 A 100 Bad 0 1 0 1 0 1 0
1 B 150 Avg 1 1 0 0 1 0 1
2 C 200 Good 1 0 1 1 0 1 2
PS: This solution assumes that the order of appearance of S and V columns in the original dataframe matches.
CodePudding user response:
You could try something like this:
# need to edit these two lines to work with your larger DataFrame
v_cols = df.columns[3:6] # ['V1', 'V2', 'V3']
s_cols = df.columns[6:] # ['S1', 'S2', 'S3']
df['SP'] = (df[v_cols].to_numpy() * df[s_cols].to_numpy()).sum(axis=1)
Edited with an alternative after seeing comment from @ALollz about MultiIndex making alignment simpler:
df.set_index(['Name', 'Qty', 'Remarks'], inplace=True)
n_cols = df.shape[1] // 2
v_cols = df.columns[:n_cols]
s_cols = df.columns[n_cols:]
df['SP'] = (df[v_cols].to_numpy() * df[s_cols].to_numpy()).sum(axis=1)
You can then reset index if you prefer:
df.reset_index(inplace=True)
Results:
Name Qty Remarks V1 V2 V3 S1 S2 S3 SP
0 A 100 Bad 0 1 0 1 0 1 0
1 B 150 Avg 1 1 0 0 1 0 1
2 C 200 Good 1 0 1 1 0 1 2
CodePudding user response:
If your Vn
and Sn
in columns are in order
v_cols = df.filter(like='V').columns
s_cols = df.filter(like='S').columns
df['SP2'] = sum([df[[v, s]].prod(axis=1) for v, s in zip(v_cols, s_cols)])
print(df)
Name Qty Remarks V1 V2 V3 S1 S2 S3 SP SP2
0 A 100 Bad 0 1 0 1 0 1 0 0
1 B 150 Avg 1 1 0 0 1 0 1 1
2 C 200 Good 1 0 1 1 0 1 2 2