Compute row distance matrix using only for loops-CodePudding

I am stuck in trying to calculate a distance matrix from different binary arrays and I can only use for loops to resolve this...

The problem consists of the following; Imagine I have a binary matrix built with different rows as follows, with dimension n=3,m=3 in this case:

np.matrix([[0,0,0],
           [1,0,1],
           [1,1,1]])

And I would like to achieve the following symmetric matrix, by adding the number of different positions on each row:

np.matrix([[0,2,3],
           [2,0,1],
           [3,1,0]])

I have been trying to do it by 2 for loops and adding when 2 positions are != but I can not achieve to know how to iterate over those vectors properly...

Any help?

CodePudding user response：

If I understood correctly, you could do:

import numpy as np

mat = np.matrix([[0,0,0],
                 [1,0,1],
                 [1,1,1]])

result = np.zeros(np.shape(mat))

nrows, ncols = np.shape(mat)
for r in range(nrows):
    # We only need to compare the upper triangular part of the matrix. 
    for i in range(r 1, nrows):
        for j in range(ncols):
            result[r, i]  = mat[r, j] != mat[i, j]
            
# Here we copy the upper triangular part to lower triangular to make it symmetric.
result = result   result.T 

print(result)

array([[0, 2, 3],
       [2, 0, 1],
       [3, 1, 0]])

If you can at least use some numpy functions:

# You can also iterate matrices row by row.
for i, row in enumerate(mat):
    # Sum differences. mat != row already calculates the differences with the whole matrix.
    result[i, :] = np.sum(mat != row, axis=1).transpose()
print(result)

array([[0, 2, 3],
       [2, 0, 1],
       [3, 1, 0]])

In case you want to see a neat trick, here is how you could do it without iterating with a for loop. The following code is using "broadcasting". We add a dimension to the array so that the comparison is automatically done using each row:

# For this trick we need to convert the matrix to an array.
mat_arr = np.asarray(mat)
result_broadcasting = np.sum(mat_arr != mat_arr[:, None], axis=2)
print(result_broadcasting)

array([[0, 2, 3],
       [2, 0, 1],
       [3, 1, 0]])