Extracting value regions/clusters from numpy array-CodePudding

I have a numpy array which contains data similar to the following:

01110000000000000000000000
00111110000222222220000000
01110000000222222200000000
00000000000222000000000000
00000000000000000000000000
00000000000000000000000000
00003333300000000000000000
00003333322222000000000000
00000000022222000000000000
00000000000000222000000000
00000000000000222000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000

So there are clusters/areas of non-zero values. I need to build a list of these clusters (list of lists where each item is a tuple of array coordinates). One cluster/area consists of the same digits (e.g. "2"s only or "3"s only).

For example, a constant representation of the bottom cluster/area whould be:

[(14, 9), (15, 9), (16, 9), (14, 10), (15, 10), (16, 10)]

I created a recursive method for it but it is slow and I have some issue with stack overflow errors.

Is there an easy non-error-prone way to implement it more efficiently in Python? Ideally with some library/matrix operations.

(Actually, the array is an image and clusters/areas are masks for a computer vision task.)

CodePudding user response：

You can use skimage.measure which has functions to label these "clusters", which can be described as connected components, and obtain the coordinates using skimage.measure.regionprops's coords method:

from skimage.measure import label, regionprops

l = label(a, connectivity=1)
clusters = [i.coords for i in regionprops(l)]

Different numbers will imply different regions. But to limit neighbouring points to a single orthogonal position, we must set connectivity=1 in skimage.measure.label, otherwise the two last clusters would be considered the same, as both have 2s.

For instance, the last component you also shared would be:

clusters[-1]
array([[ 9, 14],
       [ 9, 15],
       [ 9, 16],
       [10, 14],
       [10, 15],
       [10, 16]], dtype=int64)

Numpy array construction:

from io import StringIO
import numpy as np

s = StringIO("""
01110000000000000000000000
00111110000222222220000000
01110000000222222200000000
00000000000222000000000000
00000000000000000000000000
00000000000000000000000000
00003333300000000000000000
00003333322222000000000000
00000000022222000000000000
00000000000000222000000000
00000000000000222000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000""")
a = np.genfromtxt(s, delimiter=1)