Home > Software design >  Vectorize calling numpy function on third dimension of array
Vectorize calling numpy function on third dimension of array

Time:12-16

I have a 3D numpy array data where dimensions a and b represent the resolution of an image and c is the image/frame number. I want to call np.histogram on each pixel (a and b combination) across the c dimension, with an output array of dimension (a, b, BINS). I've accomplished this task with a nested loop, but how can I vectorize this operation?

hists = np.zeros((a, b, BINS))
for row in range(a):
    for column in range(b):
        hists[row, column, :] = np.histogram(data[row, column, :], bins=BINS)[0]

I am confident that the solution is trivial, nonetheless all help is appreciated :)

CodePudding user response:

np.histogram computes over the flattened array. However, you could use np.apply_along_axis.

np.apply_along_axis(lambda a: np.histogram(a, bins=BINS)[0], 2, data)

CodePudding user response:

This is interesting problem.

Make a Minimal Working Example (MWE)

It should be the main habit in asking questions on SO.

a, b, c = 2, 3, 4
data = np.random.randint(10, size=(a, b, c))
hists = np.zeros((a, b, c), dtype=int)
for row in range(a):
    for column in range(b):
        hists[row, column, :] = np.histogram(data[row, column, :], bins=c)[0]

data
>>> array([[[6, 4, 3, 3],
            [7, 3, 8, 0],
            [1, 5, 8, 0]],

           [[5, 5, 7, 8],
            [3, 2, 7, 8],
            [6, 8, 8, 0]]])
hists
>>> array([[[2, 1, 0, 1],
            [1, 1, 0, 2],
            [2, 0, 1, 1]],

           [[2, 0, 1, 1],
            [2, 0, 0, 2],
            [1, 0, 0, 3]]])

Make it as simple as possible (but still working)

You can eliminate one loop and simplify it:

new_data = data.reshape(a*b, c)
new_hists = np.zeros((a*b, c), dtype=int)

for row in range(a*b):
    new_hists[row, :] = np.histogram(new_data[row, :], bins=c)[0]

new_hists
>>> array([[2, 1, 0, 1],
           [1, 1, 0, 2],
           [2, 0, 1, 1],
           [2, 0, 1, 1],
           [2, 0, 0, 2],
           [1, 0, 0, 3]])

new_data
>>> array([[6, 4, 3, 3],
           [7, 3, 8, 0],
           [1, 5, 8, 0],
           [5, 5, 7, 8],
           [3, 2, 7, 8],
           [6, 8, 8, 0]])

Can you find a similar problems and use keypoints of their solution?

In general, you can't vectorise something like that is being done in loop:

for row in array:
    some_operation(row)

Except the cases you can call another vectorised operation on flattened array and then move it back to the initial shape:

arr = array.ravel()
another_operation(arr)
out = arr.reshape(array.shape)

It looks you're fortunate with np.histogram because I'm pretty sure similar things have been done before.

Final solution

new_data = data.reshape(a*b, c)
m, M = new_data.min(axis=1), new_data.max(axis=1)
bins = (c * (new_data - m[:, None]) // (M-m)[:, None])
out = np.zeros((a*b, c 1), dtype=int)
advanced_indexing = np.repeat(np.arange(a*b), c), bins.ravel()
np.add.at(out, advanced_indexing, 1)
out.reshape((a, b, -1))
>>> array([[[2, 1, 0, 0, 1],
            [1, 1, 0, 1, 1],
            [2, 0, 1, 0, 1]],

           [[2, 0, 1, 0, 1],
            [2, 0, 0, 1, 1],
            [1, 0, 0, 1, 2]]])

Note that it adds an extra bin in each histogram and puts max values in it but I hope it's not hard to fix if you need.

  • Related