I have a numpy array which contains data similar to the following:
01110000000000000000000000
00111110000222222220000000
01110000000222222200000000
00000000000222000000000000
00000000000000000000000000
00000000000000000000000000
00003333300000000000000000
00003333322222000000000000
00000000022222000000000000
00000000000000222000000000
00000000000000222000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000
So there are clusters/areas of non-zero values. I need to build a list of these clusters (list of lists where each item is a tuple of array coordinates). One cluster/area consists of the same digits (e.g. "2"s only or "3"s only).
For example, a constant representation of the bottom cluster/area whould be:
[(14, 9), (15, 9), (16, 9), (14, 10), (15, 10), (16, 10)]
I created a recursive method for it but it is slow and I have some issue with stack overflow errors.
Is there an easy non-error-prone way to implement it more efficiently in Python? Ideally with some library/matrix operations.
(Actually, the array is an image and clusters/areas are masks for a computer vision task.)
CodePudding user response:
You can use skimage.measure
which has functions to label these "clusters", which can be described as connected components, and obtain the coordinates using skimage.measure.regionprops
's coords
method:
from skimage.measure import label, regionprops
l = label(a, connectivity=1)
clusters = [i.coords for i in regionprops(l)]
Different numbers will imply different regions. But to limit neighbouring points to a single orthogonal position, we must set connectivity=1
in skimage.measure.label
, otherwise the two last clusters would be considered the same, as both have 2
s.
For instance, the last component you also shared would be:
clusters[-1]
array([[ 9, 14],
[ 9, 15],
[ 9, 16],
[10, 14],
[10, 15],
[10, 16]], dtype=int64)
Numpy array construction:
from io import StringIO
import numpy as np
s = StringIO("""
01110000000000000000000000
00111110000222222220000000
01110000000222222200000000
00000000000222000000000000
00000000000000000000000000
00000000000000000000000000
00003333300000000000000000
00003333322222000000000000
00000000022222000000000000
00000000000000222000000000
00000000000000222000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000""")
a = np.genfromtxt(s, delimiter=1)