Home > Software engineering >  Extracting value regions/clusters from numpy array
Extracting value regions/clusters from numpy array

Time:07-30

I have a numpy array which contains data similar to the following:

01110000000000000000000000
00111110000222222220000000
01110000000222222200000000
00000000000222000000000000
00000000000000000000000000
00000000000000000000000000
00003333300000000000000000
00003333322222000000000000
00000000022222000000000000
00000000000000222000000000
00000000000000222000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000

So there are clusters/areas of non-zero values. I need to build a list of these clusters (list of lists where each item is a tuple of array coordinates). One cluster/area consists of the same digits (e.g. "2"s only or "3"s only).

For example, a constant representation of the bottom cluster/area whould be:

[(14, 9), (15, 9), (16, 9), (14, 10), (15, 10), (16, 10)]

I created a recursive method for it but it is slow and I have some issue with stack overflow errors.

Is there an easy non-error-prone way to implement it more efficiently in Python? Ideally with some library/matrix operations.

(Actually, the array is an image and clusters/areas are masks for a computer vision task.)

CodePudding user response:

You can use skimage.measure which has functions to label these "clusters", which can be described as connected components, and obtain the coordinates using skimage.measure.regionprops's coords method:

from skimage.measure import label, regionprops

l = label(a, connectivity=1)
clusters = [i.coords for i in regionprops(l)]

Different numbers will imply different regions. But to limit neighbouring points to a single orthogonal position, we must set connectivity=1 in skimage.measure.label, otherwise the two last clusters would be considered the same, as both have 2s.

For instance, the last component you also shared would be:

clusters[-1]
array([[ 9, 14],
       [ 9, 15],
       [ 9, 16],
       [10, 14],
       [10, 15],
       [10, 16]], dtype=int64)

Numpy array construction:

from io import StringIO
import numpy as np

s = StringIO("""
01110000000000000000000000
00111110000222222220000000
01110000000222222200000000
00000000000222000000000000
00000000000000000000000000
00000000000000000000000000
00003333300000000000000000
00003333322222000000000000
00000000022222000000000000
00000000000000222000000000
00000000000000222000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000""")
a = np.genfromtxt(s, delimiter=1)
  • Related