Home > Net >  sumproduct 2 dataframes with nan values
sumproduct 2 dataframes with nan values

Time:09-16

Have 2 dataframes df['x'] and df['y']. How do i multiply the 2 tables and sum the rows?

x0  x1  x2      y0  y1  y2
2   5   9       2   3   4
Nan 4   7       2   2   3
3   Nan 3       1   2   3

Desired output:
(2x2) (5x3) (9x4) = 55
etc..

Nan should be treated as 0.
I keep hitting results with nan values with the below code.
Any other more efficient ways to solve this?

df['z'] = (df['x'].to_numpy() * df['y'].to_numpy()).sum(axis=1)

CodePudding user response:

Pandas supports arithmetic operations over dataframe, and also provides flexible wrappers for them.

 df.method()   operator
------------------------
     add           
     sub          -
     mul          *
     div          /
     mod          %
     pow          **

Simply multiply the original dataframe with nan values filled.

#METHOD 1
(df_x.fillna(0)*df_y.values).sum(1)

Or you can use pd.DataFrame.mul() which is equivalent to * but provides the functionality of adding fill_value

#METHOD 2
df_x.mul(df_y.values, fill_value=0).sum(1)
0    55.0
1    29.0
2    12.0
dtype: float64

Note: If df_y as nan values as well, then replace df_y.values with df_y.fillna(0).values.

CodePudding user response:

Try using filter with the regex argument:

df.filter(regex='x\d').mul(df.filter(regex='y\d').to_numpy()).sum(axis=1)

Or only with filter(like=...):

df.filter(like='x').mul(df.filter(like='y').to_numpy()).sum(axis=1)

0    55.0
1    29.0
2    12.0
dtype: float64

CodePudding user response:

Try this:

dfX = pd.DataFrame({
    'x0': [2,np.nan,3],
    'x1': [5, 4,np.nan],
    'x2': [9,7,3]
})

dfY = pd.DataFrame({
    'y0': [2,2,1],
    'y1': [3,2,2],
    'y2': [4,3,3]
})

dfX = dfX.fillna(0)
dfY = dfY.fillna(0)
for i in range(len(dfX)):
    print(np.sum(dfX.iloc[i].to_numpy()*dfY.iloc[i].to_numpy()))

Output:

55.0
29.0
12.0
  • Related