Home > front end >  numpy count number of clusters in matrix
numpy count number of clusters in matrix

Time:12-07

In a symmetric numpy matrix with only 0's and 1's, is there a method to count the number of "connecting clusters of 1's"?

For example the following numpy matrix:

np.array([[0, 0, 0, 0, 1, 1, 0],
          [0, 0, 0, 0, 1, 1, 0],
          [0, 0, 0, 0, 0, 0, 1],
          [0, 0, 0, 0, 0, 0, 0],
          [1, 1, 0, 0, 0, 0, 0],
          [1, 1, 0, 0, 0, 0, 0],
          [0, 0, 1, 0, 0, 0, 0]]
))

has two clusters of connecting 1's:

enter image description here

CodePudding user response:

You can use scipy.ndimage.label.

If you want to identify 4 clusters, use:

from scipy.ndimage import label

_, n = label(a)

print(n)

Output: 4

The default structuring element (kernel) is excluding the diagonals:

[[0,1,0],
 [1,1,1],
 [0,1,0]]

Thus, if you consider that you have two clusters (connected by the diagonal), change the default kernel:

from scipy.ndimage import label

kernel = np.ones((3, 3))

_, n = label(a, structure=kernel)
print(n)

Output: 2

Used input:

a = np.array([[0, 0, 0, 0, 1, 1, 0],
              [0, 0, 0, 0, 1, 1, 0],
              [0, 0, 0, 0, 0, 0, 1],
              [0, 0, 0, 0, 0, 0, 0],
              [1, 1, 0, 0, 0, 0, 0],
              [1, 1, 0, 0, 0, 0, 0],
              [0, 0, 1, 0, 0, 0, 0]])

CodePudding user response:

I don't think there is a straightforward numpy-only way, or a provided function. You would have to use an algorithm for that.

The typical algorithm for this problem is called connected-component labelling, and it is used in image processing. The algorithm is described on Wikipedia: https://en.wikipedia.org/wiki/Connected-component_labeling

CodePudding user response:

import numpy as np

n = 50
tmp = np.random.rand(n, n)
a = (tmp   tmp.T > 1).astype(int)  # symmetric matrix
visited = np.full((n, n), fill_value=False)
cnt = 0


def paint(i, j, n):
    if not ((0 <= i < n) and (0 <= j < n) and (i <= j)):
        return
    if visited[i, j] or a[i, j] == 0:
        return
    visited[i, j] = True
    paint(i   1, j, n)  # dfs
    paint(i - 1, j, n)
    paint(i, j   1, n)
    paint(i, j - 1, n)


for i in range(n):
    for j in range(i, n):
        if (not visited[i, j]) and a[i, j] == 1:
            paint(i, j, n)
            cnt  = 1

print(cnt)

My solution may raise RecursionError when there are lots of ones in the matrix and n is large.

  • Related