Home > database >  Using numpy.histogram on an array of images
Using numpy.histogram on an array of images


I'm trying to calculate image histograms of an numpy array of images. The array of images is of shape (n_images, width, height, colour_channels) and I want to return an array of shape (n_images, count_in_each_bin (i.e. 255)). This is done via two intermediary steps of averaging each colour channel for each image and then flattening each 2D image to a 1D one.

I think have successfully done this with the code below, however I have cheated a bit with the for loop at the end. My question is this - is there a way of getting rid of the last for loop and using an optimised numpy function instead?

def histogram_helper(flattened_image: np.array) -> np.array:
    counts, _ = np.histogram(flattened_image, bins=[n for n in range(0, 256)])
    return counts

# Using 10 RGB images of width and height 300
images = np.zeros((10, 300, 300, 3))

# Take the mean of the three colour channels
channel_avg = np.mean(images, axis=3)

# Flatten each image in the array of images, resulting in a 1D representation of each image.
flat_images = channel_avg.reshape(*channel_avg.shape[:-2], -1)

# Now calculate the counts in each of the colour bins for each image in the array.
# This will provide us with a count of how many times each colour appears in an image.
result = np.empty((0, len(self.histogram_bins) - 1), dtype=np.int32)
for image in flat_images:
    colour_counts = self.histogram_helper(image)
    colour_counts = colour_counts.reshape(1, -1)
    result = np.concatenate([result, colour_counts])

CodePudding user response:

You don't necessarily need to call np.histogram or np.bincount for this, since pixel values are in the range 0 to N. That means that you can treat them as indices and simply use a counter.

Here's how I would transform the initial images, which I imaging are of dtype np.uint8:

images = np.random.randint(0, 255, size=(10, 5, 5, 3))  # 10 5x5 images, 3 channels
reshaped = np.round(images.reshape(images.shape[0], -1, images.shape[-1]).mean(-1)).astype(images.dtype)

Now you can simply count the histograms using unbuffered addition with np.add.at:

output = np.zeros((images.shape[0], 256), int)
index = np.arange(len(images))[:, None]
np.add.at(output, (index, reshaped), 1)
  • Related