Find groups of adjacent cells in a numpy ndarray with the same value-CodePudding

I'm analyzing some 3D images from a microscope to identify discrete objects in the images. I'm at a step where I've reduced the image to a mask, where 0s are not part of an object, and positive integers are objects. The value of the integer is the type of object (gene transcript) and there may be multiple objects of the same type (i.e., there will be disconnected groups using the same number, which should be considered separate objects). I am basically trying to combine all the adjacent pixels of the same type together.

This is very similar to the problem solved by scipy.ndimage.label, except that function requires all objects to be separated by 0s. In this case, there could be adjacent objects represented by different positive values that I want to separate. I wrote a function that does a depth first search, and it works, but ideally I would want to solve this problem using numpy/scipy so that it can be done faster, as I need to iterate through hundreds of 20x2048x2048 3-dimensional images.

def assign_pixels_to_molecules(decoded, do_3d=False):
    def get_molecule_coords(id, z, x, y):
        if decoded[z, x, y] != id:
            return
        yield (z, x, y)
        decoded[z, x, y] = 0
        if x > 0:
            yield from get_molecule_coords(id, z, x - 1, y)
        if x < decoded.shape[1] - 1:
            yield from get_molecule_coords(id, z, x   1, y)
        if y > 0:
            yield from get_molecule_coords(id, z, x, y - 1)
        if y < decoded.shape[2] - 1:
            yield from get_molecule_coords(id, z, x, y   1)
        if z > 0:
            yield from get_molecule_coords(id, z - 1, x, y)
        if z < decoded.shape[0] - 1:
            yield from get_molecule_coords(id, z   1, x, y)

    molecules = []
    mol_id = 0
    for z, x, y in zip(*np.where(decoded > 0)):
        # Since we change decoded inside this loop, confirm that
        # it's still non-zero
        if decoded[z, x, y] > 0:
            gene = decoded[z, x, y]
            molecules.extend([(mol_id, gene, *coords) for coords in get_molecule_coords(gene, z, x, y)])
            mol_id  = 1
    return np.array(molecules)

The function returns a numpy array with each row being a pixel that's part of an object, and with five columns that are: a unique ID for that specific object, the type of the object, and the z,x,y coordinates of the pixel.

Is there a faster way to do this?

Edit:

I think something like this could work:

>>> x
array([[0, 0, 1, 1],
       [2, 0, 2, 1],
       [2, 0, 3, 3],
       [0, 1, 0, 0]])
>>> label(x)
(array([[0, 0, 1, 1],
       [2, 0, 1, 1],
       [2, 0, 1, 1],
       [0, 3, 0, 0]], dtype=int32), 3)
>>> x * 1000   label(x)[0]
array([[   0,    0, 1001, 1001],
       [2002,    0, 2001, 1001],
       [2002,    0, 3001, 3001],
       [   0, 1003,    0,    0]])

This gets me a mask with unique IDs for each object, then I can move forward from here.

CodePudding user response：

Apply the labeling to a mask of each class individually, then combine.

from scipy.ndimage import label
x = np.array([ # class label map, given
    [0, 0, 1, 1],
    [2, 0, 2, 1],
    [2, 0, 3, 3],
    [0, 1, 0, 0]])

classes = set(x.flat) - {0}
# classes == {1, 2, 3}
# or figure that out any other way. you could run over a range too.

labels = np.zeros_like(x) # if `classes` contains 0, this can be `empty_like`
nlabels = 0
for k in classes:
    (class_labels, class_nlabels) = label(x == k)
    mask = (class_labels != 0)
    labels[mask] = class_labels[mask]   nlabels
    nlabels  = class_nlabels

# >>> labels
array([[0, 0, 1, 1],
       [3, 0, 4, 1],
       [3, 0, 5, 5],
       [0, 2, 0, 0]])

And that's it.

Your idea of summing the class labels onto the component labels is intriguing but it doesn't behave well. Counterexample:

x = np.array([
    [0, 0, 0, 0],
    [1, 2, 1, 0],
    [0, 0, 0, 0],
    [0, 0, 0, 0]
])
# >>> x * 1000   label(x)[0]
array([[   0,    0,    0,    0],
       [1001, 2001, 1001,    0],
       [   0,    0,    0,    0],
       [   0,    0,    0,    0]])

this gives the same label (1001) to two different components that don't touch (the two single class-1 components).