I need to speed up python code with numpy-CodePudding

This code is the Gaussian noise symbol in the photo. There was a speed issue: fhd photo is processed in about 0.45 seconds. This is impermissible for my tasks. I need to reach a speed of at least milliseconds.

import numpy as np
import cv2

image = cv2.imread('1.jpg')

row,col,ch= image.shape
mean = 0
var = 0.1
sigma = var**0.5
gauss = np.random.normal(mean,sigma,(row,col,ch))
gauss = gauss.reshape(row,col,ch)
noisy = image   gauss

cv2.imwrite('2.jpg', noisy)

Already optimized the slowest part of the code (generating an array of random numbers) (this operation takes about 0.32s):

gauss = np.random.normal(mean,sigma,(row,col,ch))
gauss = gauss.reshape(row,col,ch

I reduced the matrix a hundred times, and then multiplied it a hundred times:

roww=int(row/100)
b = timeit.default_timer()
gauss = np.random.normal(mean,sigma,(roww,col,ch))
gauss = gauss.reshape(roww*col*ch)
gauss = np.tile(gauss, 100)
gauss = gauss.reshape(row,col,ch)

The code above takes 20ms, of which the most time is spent multiplying one matrix into a large one (18ms):

gauss = np.tile(gauss, 100)

How could you make this operation faster?

And now to the main problem: all this code still takes a very long time (170ms), the most time-consuming operations:

Adding matrices takes 30ms.

noisy=image gauss

Opening (35ms)

image = cv2.imread("1.jpg"

and saving (90ms) photo:

cv2.imwrite('2.jpg', noisy)

Is it possible to speed up these operations in any way in python? Thanks!

Full code:

import numpy as np
import cv2

image = cv2.imread('1.jpg')

row,col,ch= image.shape
mean = 0
var = 0.1
sigma = 10
roww=int(row/100)
gauss = np.random.normal(mean,sigma,(roww,col,ch))
gauss = gauss.reshape(roww*col*ch)
gauss = np.tile(gauss, 100)
gauss = gauss.reshape(row,col,ch)
noisy=image gauss

cv2.imwrite('2.jpg', noisy)

CodePudding user response：

The first code is bounded by the time to generate the random numbers. This is a generally a slow operation (whatever the language although there are tricks to speed up the performance of low-level native codes). Thus, there is not much to do in Numpy. You can use Numba to parallelize this operation, but note that the current random number generator of Numba is a bit slower in sequential than Numpy.

The read/write operations should be bounded by your storage device and the speed of encoding/decoding. For the former, you can use in-RAM virtual devices, but if you cannot control that then there is nothing to do (apart from using a faster hardware like a Nvme SSD). For the latter, the Python wrapper of OpenCV already use an highly-optimized JPEG encoder/decoder that should already be fast regarding the operation to perform. So you cannot speed up this part, but you can do multiple of them in parallel.

Regarding the Numpy code, there are two main issues:

First, np.random.normal generate a 64-bit floating-point number (float64) array while the image array contains only 8-bit integers (uint8). Working on float64 values is much more expensive than uint8 ones (up to one order of magnitude slower in the worst case). Unfortunately, there is no (simple) way to generate random integer with a normal distribution as this distribution is tightly bounded to real numbers. Numpy also lack of a parameter to work on float32. Your solution to reuse numbers is quite good to improve the performances. Still, adding a uint8 array with a float64 one is expensive as Numpy converts the first to a float64 array and then produce a new float64 array. You can convert the random array to an uint8 array in the first place but this is not so easy in practice. Indeed, the negative values cannot be converted to correct uint8 ones and even if they could, the addition likely cause some overflows. Note that Numba can help to further speed up this part as it can convert the float64 values to uint8 ones on the fly (and in parallel).

Moreover, np.tile should theoretically not copy the array but it sadly does make a copy here. Hopefully, you can remove this expensive copy using broadcasting.

Here is the resulting code:

row,col,ch= image.shape
mean = 0
var = 0.1
sigma = 10
roww=int(row/100)
gauss = np.random.normal(mean,sigma,(roww,col,ch))
noisy=(image.reshape(-1,roww,col,ch)   gauss.astype(np.uint8)).reshape(row, col, ch)

I advise you to perform the whole operation using Numba:

import numba as nb

@nb.njit('uint8[:,:,::1](uint8[:,:,::1])', parallel=True)
def compute(image):
    row, col, ch = image.shape
    mean = 0
    var = 0.1
    sigma = 10
    out = np.empty_like(image)
    roww=int(row/40)
    gauss = np.random.normal(mean,sigma,(roww,col,ch))
    for i in nb.prange(row):
        iWrap = i % roww
        for j in range(col):
            for c in range(ch):
                rnd = gauss[iWrap, j, c]
                intRnd = int(np.round(rnd))
                noisedInt = int(image[i, j, c])   intRnd
                clampedNoisedInt = min(max(noisedInt, 0), 255)
                out[i, j, c] = clampedNoisedInt
    return out

Here are the timings on my 6-core machine on a 1920x1080x3 image (without including the time to read/write the image):

Initial:              23.2 ms
Optimized with Numpy: 17.9 ms
Optimized with Numba:  4.3 ms

If this is not fast enough, then you need to rewrite this operation in C using low-level fast SIMD intrinsics and multiple threads.

The read/write take respectively about 21/42 ms on my machine (which is actually good since the input/output image are heavily compressed).