I am trying to find a fast vectorized (at least partially) solution finding combinatorial occurrence between two 2D numpy array to identified Single Point Polymorphism linkage. The shape of each array is (factors, samples) an example for matrix 1 is as follows:
array([[0., 1., 1.],
[1., 0., 1.]])
and matrix 2
array([[1., 1., 0.],
[0., 0., 0.]])
I need to find the total number of occurrence along samples axis for each permutation of 2 factors at the same position of 2 matrix (order matters because (1,0) count is different from (0,1) count). Therefore the combinations should be [(0, 0), (0, 1), (1, 0), (1, 1)] and the final output is (factor, factor) for counts of each occurrence.
For combination (0,0) for instance, we get the matrix
array([[0, 1],
[0., 1]])
Because 0 counts (0,0) along row 0 of matrix 1 & row 0 of matrix 2, 1 along row 0 of matrix 1 & row 1 of matrix 2, 0 along row 1 of matrix 1 & row 0 of matrix 2, 1 along row 1 of matrix 1 & row 1 of matrix 2,
CodePudding user response:
With example data
import numpy as np
array1 = np.array([
[0., 1., 1.],
[1., 0., 1.]])
array2 = np.array([
[1., 1., 0.],
[0., 0., 0.]])
We can count the desired combinations with np.einsum
and reshape
to a suitable array
c1 = np.array([1-array1, array1]).astype('int')
c2 = np.array([1-array2, array2]).astype('int')
np.einsum('ijk,lmk->iljm', c1, c2).reshape(-1, len(array1), len(array2))
Output
array([[[0, 1], # counts for (0,0) (provided result in the question is not correct)
[0, 1]],
[[1, 0], # counts for (0,1)
[1, 0]],
[[1, 2], # counts for (1,0)
[1, 2]],
[[1, 0], # counts for (1,1)
[1, 0]]])
Checking that the previous results are equal to dot products
import itertools as it
np.array([x @ y.T for x, y in it.product(c1, c2)])
Output
array([[[0, 1],
[0, 1]],
[[1, 0],
[1, 0]],
[[1, 2],
[1, 2]],
[[1, 0],
[1, 0]]])
CodePudding user response:
Since I realized the solution while trying to derive a manual example for the question, I will just provide that we should solve these by dot products:
matrix1_0 = (array1[0]==0).astype('int')
matrix1_1 = (array1[0]==1).astype('int')
matrix2_0 = (array2[1]==0).astype('int')
matrix2_1 = (array2[1]==1).astype('int')
count_00 = np.dot(matrix1_0 , matrix2_0.T)
count_01 = np.dot(matrix1_0 , matrix2_1.T)
count_10 = np.dot(matrix1_1 , matrix2_0.T)
count_11 = np.dot(matrix1_1 , matrix2_1.T)
These would correspond to sum of number of occurrence for each combination for each factor along a certain axis (sample axis 1 here).