Use NumPy to apply a fixed palette to an image?-CodePudding

I have a NumPy image in RGB bytes, let's say it's this 2x3 image:

img = np.array([[[  0, 255,   0], [255, 255, 255]],
                [[255,   0, 255], [  0, 255, 255]],
                [[255,   0, 255], [  0,   0,   0]]])

I also have a palette that covers every color used in the image. Let's say it's this palette:

palette = np.array([[255,   0, 255],
                    [  0, 255,   0],
                    [  0, 255, 255],
                    [  0,   0,   0],
                    [255, 255, 255]])

Is there some combination of indexing the image against the palette (or vice versa) that will give me a paletted image equivalent to this?

img_p = np.array([[1, 4],
                  [0, 2],
                  [0, 3]])

For comparison, I know the reverse is pretty simple. palette[img_p] will give a result equivalent to img. I'm trying to figure out if there's a similar approach in the opposite direction that will let NumPy do all the heavy lifting.

I know I can just iterate over all the image pixels individually and build my own paletted image. I'm hoping there's a more elegant option.

Okay, so I implemented the various solutions below and ran them over a moderate test set: 20 images, each one 2000x2000 pixels, with a 32-element palette of three-byte colors. Pixels were given random palette indexes. All algorithms were run over the same images.

Timing results:

mostly empty lookup array - 0.89 seconds
np.searchsorted approach - 3.20 seconds
Pandas lookup, single integer - 38.7 seconds
Using == and then aggregating the boolean results - 66.4 seconds
inverting the palette into a dict and using np.apply_along_axis() - Probably ~500 seconds, based on a smaller test set
Pandas lookup with a MultiIndex - Probably ~3000 seconds, based on a smaller test set

Given that the lookup array has a significant memory penalty (and a prohibitive one if there's an alpha channel), I'm going to go with the np.searchsorted approach. The lookup array is significantly faster if you want to spend the RAM on it.

CodePudding user response：

Edit Here is a faster way that uses np.searchsorted.

def rev_lookup_by_sort(img, palette):
    M = (1   palette.max())**np.arange(3)
    p1d, ix = np.unique(palette @ M, return_index=True)
    return ix[np.searchsorted(p1d, img @ M)]

Correctness (by equivalence to rev_lookup_by_dict() in the original answer below):

np.array_equal(
    rev_lookup_by_sort(img, palette),
    rev_lookup_by_dict(img, palette),
)

Speedup (for a 1000 x 1000 image and a 1000 colors palette):

orig = %timeit -o rev_lookup_by_dict(img, palette)
# 2.47 s ± 10.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

v2 = %timeit -o rev_lookup_by_sort(img, palette)
# 71.8 ms ± 93.7 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> orig.average / v2.average
34.46

So that answer using np.searchsorted is 30x faster at that size.

Original answer

An initial shot gives a slowish version (hopefully we can do better). It uses a dict, where keys are colors as tuples.

def rev_lookup_by_dict(img, palette):
    d = {tuple(v): k for k, v in enumerate(palette)}
    def func(pix):
        return d.get(tuple(pix), -1)
    return np.apply_along_axis(func, -1, img)

img_p = rev_lookup_by_dict(img, palette)

Notice that "color not found" is expressed as -1 in img_p.

On your (modified) data:

>>> img_p
array([[1, 4],
       [0, 2],
       [0, 3]])

Larger example:

# setup
from math import isqrt

w, h = 1000, 1000
s = isqrt(w * h)
palette = np.random.randint(0, 256, (s, 3))
img = palette[np.random.randint(0, s, (w, h))]

Test:

img_p = rev_lookup_by_dict(img, palette)

>>> np.array_equal(palette[img_p], img)
True

Timing:

%timeit rev_lookup_by_dict(img, palette)
# 2.48 s ± 16.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

That's quite awful, but hopefully we can do better.

CodePudding user response：

Faster than a dictionary, but with a 64 MB lookup array.

d = np.zeros((256,256,256), np.int32)  # 64 MB!
d[tuple(palette.T)] = np.arange(len(palette))
img_p = d[tuple(img.reshape(-1,3).T)].reshape(*img.shape[:2])
# %%timeit 10 loops, best of 5: 25.8 ms per loop (1000 x 1000)

np.testing.assert_equal(img, palette[img_p])

CodePudding user response：

If you can use Pandas in addition to NumPy, you can use a Pandas MultiIndex as a sort of sparse array:

inverse_palette = pd.Series(np.arange(len(palette)),
                            index=pd.MultiIndex.from_arrays(palette.T)).sort_index()
img_p = np.apply_along_axis(lambda px: inverse_palette[tuple(px)], 2, img)

That's really slow, though. You can do a bit better by converting the colors into integers first:

def collapse_bytes(array):
    result = np.zeros(array.shape[:-1], np.uint32)
    for i in range(array.shape[-1]):
        result = result * 256   array[...,i]
    return result

inverse_palette = pd.Series(np.arange(len(palette)),
                            index=collapse_bytes(palette)).sort_index()
img_p = inverse_palette[collapse_bytes(img).flat].to_numpy()\
                                                 .reshape(img.shape[:-1])