I have the following dataframe
import numpy as np
import pandas as pd
data = [[30, 19, 6], [12, 23, 14], [8, 18, 20]]
df = pd.DataFrame(data = data, index = ['A', 'B', 'C'], columns = ['Bulgary', 'Robbery', 'Car Theft'])
df['r_Total'] = df.iloc[:, :3].sum(axis=1)
df.loc['c_Total'] = df.iloc[:3, :].sum(axis=0)
df
Bulgary | Robbery | Car Theft | r_Total | |
---|---|---|---|---|
A | 30 | 19 | 6 | 55 |
B | 12 | 23 | 14 | 49 |
C | 8 | 18 | 20 | 46 |
c_Total | 50 | 60 | 40 | 150 |
Now, I want to obtain expectation matrix E
, where
E_{ij} = (ith_r_Total/Total)(jth_c_Total/Total)*Total
(eg: E_{00} = (55/150)*(50/150)*150.) How should I accomplish this?
Of course, there's always a brute force method to iterate over i and j in two for-loops, but is there a smarter way of doing it?
Mathematically, r_Total * c_Total / 150
would give me the answer I want. It really boils down to how to do matrix multiplication on dataframe. I tried:
df['r_Total'].dot(df.loc['c_Total'])
But got the following error:
ValueError: matrices are not aligned
I also tried:
df['r_Total'].mul(df.loc['c_Total'])
But got
A NaN
B NaN
Bulgary NaN
C NaN
Car Theft NaN
Robbery NaN
Total 22500.0
Name: Total, dtype: float64
But this is not my expected output. It is 1 col x 1 row, so my expected output should be a 3x3 matrix.
CodePudding user response:
To perform a matrix multiplication, the dimensions have to match. Since your expected output is of shape (3, 3), yet the two "matrices" are columns and rows, you'll have to make them matrices of shape (3, 1) and (1, 3) so that the multiplication would have the desired shape. You can do that by using double brackets ([[]]
) when selecting.
To make the multiplication work, you'll have to convert at least one of the dataframes into a numpy array (that's what's happening with .values
)
Also, to exclude the "_Total" rows, only select up to the last element.
Then divide by the last element in the dataframe which is 150 in your example.
res = df.loc[df.index[:-1], ['c_Total']].dot(df.loc[['r_Total']].values[:, :-1]).div(df.iat[-1,-1])