The goal is to calculate the expected values of multiple contingency matrices by multiplying sums of the rows by sums of the columns (described here: expected values Eij).
The input is a set of matrices like this:
mat = np.array(
[[[11., 13.],
[12., 14.]],
[[ 8., 10.],
[15., 17.]],
[[11., 10.],
[12., 17.]]])
The code I have works (below), but I'd like to expand it to cover matrices that are larger than 2 x 2. The output should have the same dimensions as the input.
I'd like to avoid loops because the full calculation involves a massive array - it would be good to have Numpy do all the work.
cols = np.sum(mat, axis=1)
rows = np.sum(mat, axis=2)
tots = np.sum(cols, 1)
exp_00 = cols[:,0]*rows[:,0]/tots
exp_01 = cols[:,1]*rows[:,0]/tots
exp_10 = cols[:,0]*rows[:,1]/tots
exp_11 = cols[:,1]*rows[:,1]/tots
mat_exp = np.array([exp_00, exp_01, exp_10, exp_11]).T.reshape(len(mat),2,2)
print(mat_exp)
Output:
[[[11.04 12.96]
[11.96 14.04]]
[[ 8.28 9.72]
[14.72 17.28]]
[[ 9.66 11.34]
[13.34 15.66]]]
CodePudding user response:
You can do matrix multiplication of rows and cols, and then divide by total:
mat.sum(1, keepdims=True) * mat.sum(2, keepdims=True) / mat.sum((1, 2), keepdims=True)
array([[[11.04, 12.96],
[11.96, 14.04]],
[[ 8.28, 9.72],
[14.72, 17.28]],
[[ 9.66, 11.34],
[13.34, 15.66]]])