Is there some numpy trick i can use to replace this for loop?-CodePudding

Assume grid_sheet is an array (1000, 1000, 3)

and

array2 is numpy array shaped (13k-ish, 3).

We're basically treating this array2 like a list. A list of rgb value combinations. Each combination is unique.

And grid_sheet should be treated like a screenshot as if you've used snipping tool to create the image.

blank_sheet = np.zeros((grid_sheet.shape[0], grid_sheet.shape[1]))
for data in array2:
    blank_sheet = np.where(((grid_sheet[:,:,2] == data[2]) & (grid_sheet[:,:,1] == data[1]) & (grid_sheet[:,:,0] == data[0])), blank_sheet 1, blank_sheet)

The output would be like a boolean array the same size as the grid_sheet. I don't want to use a for loop on array2 because it's just too slow.

I've tried splitting the channels to compare to their corresponding columns but when dstacking and summing it all back together just shows it marks nearly then entire grid with 1s. Results are the same if i merge the values together flatten then compare and then reshape to an image representable way. There are a number of other idea's i've tried, plenty of stackoverflow solutions i've tried to merge with others. I hardly see any point in nditer. someone tried suggesting itertools but i don't think the 2 mesh well together.

CodePudding user response：

We can use broadcasting for this purpose. But first we have to add additional axes and slightly reorganize data.

To apply comparison along the third axis we have to transpose the array with colors, so that its colors are numbered by second index. And put two additional dimensions at the beginning to fit with the image plane:

colors = array2.T[None,None,:,:]

Now colors has 4 dimensions and its shape is (1, 1, 3, len(array2)). Next step is to add forth dimension to the image which will correspond the index of each color of the array2:

image = grid_sheet[:,:,:,None]

Now image has also 4 dimensions and its shape is (1000, 1000, 3, 1). If we compare image and colors, the comparison will be done along the third axis, i.e. by colors only. To find out if all parts of a color match the color in the image point we apply all(2), where 2 addresses the third axis. Then we apply any along the last dimension in order to find if any of the given colors matches the color of the image point:

result = (image == colors).all(2).any(2)

Note, that after the all method the number of dimentions has been reduced by 1, so the index of the last dimension will be 2. That's why we put 2 as the parameter of any.

Test case

from numpy import arange, array, newaxis

image = arange(3*3*2).reshape(3,3,2)
colors = array([[0,1], [2,3], [4,5], [8,9]])

expected = array([
    [ True,  True,  True],
    [False,  True, False],
    [False, False, False]
])

image = image[:, :, :, newaxis]
colors = colors.T[newaxis, newaxis, :, :]

assert colors.shape == (1,1,2,4)
assert image.shape == (3,3,2,1)

result = (image == colors).all(2).any(2)

assert (result == expected).all()

An example with Dask to process big pictures

import numpy as np
import dask.array as da
from dask.distributed import Client

client = Client(n_workers=4)
display(client)

# X, Y : a size of an image
# N : a number of colors to check
X, Y, N = 1080, 1920, 13921

# dX, dY, dN : dimensions of chunks for dask.array
# values vary by computer
dX, dY, dN = 360, 640, 500

image = np.arange(X*Y*3).reshape(X, Y, 3)
colors = np.arange(N*3).reshape(N, 3)

image = image[:, :, :, None]
colors = colors.T[None, None, :, :]

# a shape of chunks should resemble the logic of broadcasting
im = da.from_array(image, chunks=(dX, dY, 3, 1))
co = da.from_array(colors, chunks=(1, 1, 3, dN))

im, co = da.broadcast_arrays(im, co)

re = (im == co).all(2).any(2)

result = re.compute()

assert result.shape == (X, Y)
assert result.sum() == N