Finding combinatorial occurrences along combination of rows between 2 numpy array-CodePudding

I am trying to find a fast vectorized (at least partially) solution finding combinatorial occurrence between two 2D numpy array to identified Single Point Polymorphism linkage. The shape of each array is (factors, samples) an example for matrix 1 is as follows:

array([[0., 1., 1.],
       [1., 0., 1.]])

and matrix 2

array([[1., 1., 0.],
       [0., 0., 0.]])

I need to find the total number of occurrence along samples axis for each permutation of 2 factors at the same position of 2 matrix (order matters because (1,0) count is different from (0,1) count). Therefore the combinations should be [(0, 0), (0, 1), (1, 0), (1, 1)] and the final output is (factor, factor) for counts of each occurrence.

For combination (0,0) for instance, we get the matrix

array([[0, 1],
       [0., 1]])

Because 0 counts (0,0) along row 0 of matrix 1 & row 0 of matrix 2, 1 along row 0 of matrix 1 & row 1 of matrix 2, 0 along row 1 of matrix 1 & row 0 of matrix 2, 1 along row 1 of matrix 1 & row 1 of matrix 2,

CodePudding user response：

With example data

import numpy as np

array1 = np.array([
        [0., 1., 1.],
        [1., 0., 1.]])
array2 = np.array([
        [1., 1., 0.],
        [0., 0., 0.]])

We can count the desired combinations with np.einsum and reshape to a suitable array

c1 = np.array([1-array1, array1]).astype('int')
c2 = np.array([1-array2, array2]).astype('int')
np.einsum('ijk,lmk->iljm', c1, c2).reshape(-1, len(array1), len(array2))

Output

array([[[0, 1],    # counts for (0,0) (provided result in the question is not correct)
        [0, 1]],

       [[1, 0],    # counts for (0,1)
        [1, 0]],

       [[1, 2],    # counts for (1,0)
        [1, 2]],

       [[1, 0],    # counts for (1,1)
        [1, 0]]])

Checking that the previous results are equal to dot products

import itertools as it

np.array([x @ y.T for x, y in it.product(c1, c2)])

Output

array([[[0, 1],
        [0, 1]],

       [[1, 0],
        [1, 0]],

       [[1, 2],
        [1, 2]],

       [[1, 0],
        [1, 0]]])

CodePudding user response：

Since I realized the solution while trying to derive a manual example for the question, I will just provide that we should solve these by dot products:

matrix1_0 = (array1[0]==0).astype('int')
matrix1_1 = (array1[0]==1).astype('int')
matrix2_0 = (array2[1]==0).astype('int')
matrix2_1 = (array2[1]==1).astype('int')

count_00 = np.dot(matrix1_0 , matrix2_0.T)
count_01  = np.dot(matrix1_0 , matrix2_1.T)
count_10  = np.dot(matrix1_1 , matrix2_0.T)
count_11  = np.dot(matrix1_1 , matrix2_1.T)

These would correspond to sum of number of occurrence for each combination for each factor along a certain axis (sample axis 1 here).