I am stuck in trying to calculate a distance matrix from different binary arrays and I can only use for loops to resolve this...
The problem consists of the following; Imagine I have a binary matrix built with different rows as follows, with dimension n=3,m=3 in this case:
np.matrix([[0,0,0],
[1,0,1],
[1,1,1]])
And I would like to achieve the following symmetric matrix, by adding the number of different positions on each row:
np.matrix([[0,2,3],
[2,0,1],
[3,1,0]])
I have been trying to do it by 2 for loops and adding when 2 positions are !=
but I can not achieve to know how to iterate over those vectors properly...
Any help?
CodePudding user response:
If I understood correctly, you could do:
import numpy as np
mat = np.matrix([[0,0,0],
[1,0,1],
[1,1,1]])
result = np.zeros(np.shape(mat))
nrows, ncols = np.shape(mat)
for r in range(nrows):
# We only need to compare the upper triangular part of the matrix.
for i in range(r 1, nrows):
for j in range(ncols):
result[r, i] = mat[r, j] != mat[i, j]
# Here we copy the upper triangular part to lower triangular to make it symmetric.
result = result result.T
print(result)
array([[0, 2, 3],
[2, 0, 1],
[3, 1, 0]])
If you can at least use some numpy functions:
# You can also iterate matrices row by row.
for i, row in enumerate(mat):
# Sum differences. mat != row already calculates the differences with the whole matrix.
result[i, :] = np.sum(mat != row, axis=1).transpose()
print(result)
array([[0, 2, 3],
[2, 0, 1],
[3, 1, 0]])
In case you want to see a neat trick, here is how you could do it without iterating with a for loop. The following code is using "broadcasting". We add a dimension to the array so that the comparison is automatically done using each row:
# For this trick we need to convert the matrix to an array.
mat_arr = np.asarray(mat)
result_broadcasting = np.sum(mat_arr != mat_arr[:, None], axis=2)
print(result_broadcasting)
array([[0, 2, 3],
[2, 0, 1],
[3, 1, 0]])