Speed problems when extracting data from arrays with for loops-CodePudding

I am using simple code to extract the raw rgb data from an opencv image array (x,y,3):

import numpy as np
import opencv2 as cv
image = cv.imread(image)
rgb = True
l = image.shape
lx, ly = l[0], l[1]
frame = []
for py in range (0, ly):
    for px in range (0, lx):
        p = _color16(image[py,px])
        frame.extend((int(p[0]), int(p[1])))

print (len(frame))

def _color16(acolor):
    aR, aG, aB = acolor
    if rgb is True:
        rgb_word = ((aR >> 3) << 11)   ((aG >> 2) << 5)   (aB >> 3)
        rgb_byte1 = rgb_word >> 8
        rgb_byte2 = rgb_word & 255
        return rgb_byte1, rgb_byte2
    else:
        bgr_word = ((aB >> 3) << 11)   ((aG >> 2) << 5)   (aR >> 3)
        bgr_byte1 = bgr_word >> 8
        bgr_byte2 = bgr_word & 255
        return bgr_byte1, bgr_byte2

This runs dead slow, it seems. To extract 115.2 Kbytes of data (240x240x3 8-bit values), it takes around 4 seconds on a raspberry Pi 4B running at 1.3 GHz. The _color16 function returns processed rgb data for 16 bit rgb/bgr transmission over SPI. I don't think this part is the source, but I am not sure so I have added the code for completeness.

Can anyone give any clues to why this code runs so slow?

I've been trying to convert the frame buffer data from a (x,y,3) ndarray to a list with (r,g,b) tuples with numpy but so far have failed to identify the correct function for that.

Converting with numpy.ndarray.flatten works perfectly for 18/24 bit rgb data as it is 1:1, and gives me frame rates of 40 with the SPI controller running at 62.5 MHz.

CodePudding user response：

You're using python-level loops, calling a function, creating a tuple of the result and then calling list.extend for each iteration. The python-level loops by themselves are way too slow. Instead, take advantage of numpy's speed to do your operations elementwise.

if not self._rgb:
    image = image[...,::-1]
# Assuming your image has dtype `np.uint8`, you need to upcast it
# so the bitwise operations will not go out of range.
image = image.astype(np.uint16)
rgb_word = ((image[...,0] >> 3) << 11)   ((image[...,1] >> 2) << 5)   (image[...,2] >> 3)
rgb_word = rgb_word.flatten()
# Use `frame.tolist()` if you really want a list
frame = np.column_stack([rgb_word >> 8, rgb_word & 255]).flatten()

This runs about 600x faster on my machine for a (240, 240, 3) array.

By the way, you have an error in your looping but it's invisible because both the width and height of the image are the same.

l = image.shape
lx, ly = l[0], l[1]
frame = []
for py in range (0, ly):       # this should be `range(lx)`
    for px in range (0, lx):   # and this should be `range(ly)`

CodePudding user response：

So I went to work and the following is now my converter function:

def imageConvert(self, image):
    if not self._rgb:
        image = image [...,::-1]
    if self._bpp == 18:
        return image.flatten()
    elif self._bpp == 16:
        image = image.astype(np.uint16)
        awords = ((image[...,0] >> 3) << 11) | ((image[...,1] >> 2) << 5) | (image[...,2] >> 3)
        words = awords.flatten()
        frame = np.column_stack([words >> 8, words & 255]).flatten().astype(np.uint8)
        return frame
    else:
        image = image.astype(np.uint16)
        words = (((image[...,0] >> 4) << 8) | ((image[...,1] >> 4) << 4) | (image[...,2] >> 4)).flatten()
        dwords = np.column_stack([words[1::2] | words[::2] & 255]).flatten()
        byte1 = dwords >> 16
        byte2 = (dwords >> 8) & 255
        byte3 = dwords & 255
        frame = np.column_stack([byte1,byte2,byte3]).flatten().astype(np.uint8)
        return frame

This works like a charm. Thanks for your help, again.