Home > OS >  Fastest way to sum the values of the pixels above a threshold in an image with Python
Fastest way to sum the values of the pixels above a threshold in an image with Python

Time:10-28

I am trying to find to the best method to retrieve the sum of the pixels value that are bigger than a certain threshold. For example, if my threshold is 253, and I got 10 pixels that are 254, and another 10 that are 255, I expect to get 10*254 10*255 = 5090 - sort of total intensity of the pixels that are over the threshold.

I found a way to do so with np.histogram:

import cv2, time
import numpy as np
threshold = 1
deltaImg = cv2.imread('image.jpg')
t0=time.time()
histogram = np.histogram(deltaImg,256-threshold,[threshold,256])
histoSum = sum(histogram[0]*histogram[1][:-1])
print(histoSum)
print("time = %.2f ms" % ((time.time()-t0)*1000))

This works and I get the sum of the pixels valus that were bigger than the selected threshold. However, I am not sure this is the best/fastest way to go. Obviously, the bigger the threshold is, the faster the action will take.

Does any one has an idea how can I get the right result but with a faster algorithm?

CodePudding user response:

Here you go:

import numpy as np
image = np.random.randint(0,256,(10,10))
threshold = 1
res = np.sum(image[image > threshold])

This operation:

%%timeit
res = np.sum(image[image >=threshold])

takes 5.43 µs ± 137 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each).

CodePudding user response:

While OP's approach is fundamentally inaccurate, the underlying idea can still be use to craft an approach that is valid for integer arrays (such as grayscale images):

def sum_gt_hist(arr, threshold):
    values = np.arange(threshold, np.max(arr)   1)
    hist, edges = np.histogram(arr, values   0.5)
    return sum(values[1:] * hist)

This is however non-ideal because it is more complex than it should be (np.histogram() is a relatively complex function which computes much more intermediate information than needed) and would only work for integer values.

A simpler and still pure NumPy approach was proposed in @sehan2's answer:

import numpy as np


def sum_gt_np(arr, threshold):
    return np.sum(arr[arr > threshold])

While the above would be the preferred NumPy-only solution, much faster execution (and memory efficiency) can be obtained with a simple Numba-based solution:

import numba as nb


@nb.njit
def sum_gt_nb(arr, threshold):
    arr = arr.ravel()
    result = 0
    for x in arr:
        if x > threshold:
            result  = x
    return result

Benchmarking the above with a random 100x100 array representing an image, one would get:

import numpy as np


np.random.seed(0)
arr = np.random.randint(0, 256, (100, 100))  # generate a random image
threshold = 253  # set a threshold

funcs = sum_gt_hist, sum_gt_np, sum_gt_nb
for func in funcs:
    print(f"{func.__name__:16s}", end='  ')
    print(func(arr, threshold), end='  ')
    %timeit func(arr, threshold)

# sum_gt_hist       22397  355 µs ± 8.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# sum_gt_np         22397  10.1 µs ± 438 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# sum_gt_nb         22397  1.19 µs ± 33.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

This indicates that sum_gt_nb() is largely faster than sum_gt_np() which in turn is largely faster than sum_gt_hist().

  • Related