Calculating expected values for multiple contingency matrices using numpy-CodePudding

The goal is to calculate the expected values of multiple contingency matrices by multiplying sums of the rows by sums of the columns (described here: expected values Eij).

The input is a set of matrices like this:

mat = np.array(
    [[[11., 13.],
     [12., 14.]],

     [[ 8., 10.],
      [15., 17.]],

     [[11., 10.],
     [12., 17.]]])

The code I have works (below), but I'd like to expand it to cover matrices that are larger than 2 x 2. The output should have the same dimensions as the input.

I'd like to avoid loops because the full calculation involves a massive array - it would be good to have Numpy do all the work.

cols = np.sum(mat, axis=1)
rows = np.sum(mat, axis=2)
tots = np.sum(cols, 1)
exp_00 = cols[:,0]*rows[:,0]/tots
exp_01 = cols[:,1]*rows[:,0]/tots
exp_10 = cols[:,0]*rows[:,1]/tots
exp_11 = cols[:,1]*rows[:,1]/tots
mat_exp = np.array([exp_00, exp_01, exp_10, exp_11]).T.reshape(len(mat),2,2)

print(mat_exp)

Output:

[[[11.04 12.96]
  [11.96 14.04]]

 [[ 8.28  9.72]
  [14.72 17.28]]

 [[ 9.66 11.34]
  [13.34 15.66]]]

CodePudding user response：

You can do matrix multiplication of rows and cols, and then divide by total:

mat.sum(1, keepdims=True) *  mat.sum(2, keepdims=True) /  mat.sum((1, 2), keepdims=True)

array([[[11.04, 12.96],
        [11.96, 14.04]],

       [[ 8.28,  9.72],
        [14.72, 17.28]],

       [[ 9.66, 11.34],
        [13.34, 15.66]]])