Have 2 dataframes df['x'] and df['y']. How do i multiply the 2 tables and sum the rows?
x0 x1 x2 y0 y1 y2
2 5 9 2 3 4
Nan 4 7 2 2 3
3 Nan 3 1 2 3
Desired output:
(2x2) (5x3) (9x4) = 55
etc..
Nan should be treated as 0.
I keep hitting results with nan values with the below code.
Any other more efficient ways to solve this?
df['z'] = (df['x'].to_numpy() * df['y'].to_numpy()).sum(axis=1)
CodePudding user response:
Pandas supports arithmetic operations over dataframe, and also provides flexible wrappers for them.
df.method() operator
------------------------
add
sub -
mul *
div /
mod %
pow **
Simply multiply the original dataframe with nan values filled.
#METHOD 1
(df_x.fillna(0)*df_y.values).sum(1)
Or you can use pd.DataFrame.mul()
which is equivalent to *
but provides the functionality of adding fill_value
#METHOD 2
df_x.mul(df_y.values, fill_value=0).sum(1)
0 55.0
1 29.0
2 12.0
dtype: float64
Note: If
df_y
as nan values as well, then replacedf_y.values
withdf_y.fillna(0).values
.
CodePudding user response:
Try using filter
with the regex
argument:
df.filter(regex='x\d').mul(df.filter(regex='y\d').to_numpy()).sum(axis=1)
Or only with filter(like=...)
:
df.filter(like='x').mul(df.filter(like='y').to_numpy()).sum(axis=1)
0 55.0
1 29.0
2 12.0
dtype: float64
CodePudding user response:
Try this:
dfX = pd.DataFrame({
'x0': [2,np.nan,3],
'x1': [5, 4,np.nan],
'x2': [9,7,3]
})
dfY = pd.DataFrame({
'y0': [2,2,1],
'y1': [3,2,2],
'y2': [4,3,3]
})
dfX = dfX.fillna(0)
dfY = dfY.fillna(0)
for i in range(len(dfX)):
print(np.sum(dfX.iloc[i].to_numpy()*dfY.iloc[i].to_numpy()))
Output:
55.0
29.0
12.0