What would be the fastest way to append newly reshaped image matrix to new array?-CodePudding

last_conv_w, last_conv_h, n_channels = last_conv_output.shape
upscaled_h = last_conv_h * height_factor
upscaled_w = last_conv_w * width_factor

upsampled_last_conv_output = np.zeros((upscaled_h, upscaled_w, n_channels))

for x in range(0, n_channels, 512):
    upsampled_last_conv_output[:, :, x:x 512] = cv2.resize(last_conv_output[:, :, x:x 512], (upscaled_w, upscaled_h), cv2.INTER_CUBIC)

upsampled_last_conv_output.shape

This code right here adds a resized image matrix last_conv_output which initial shape was (7, 7, 2048). What I thought was possible was just to do this operation:

upsampled_last_conv_output = cv2.resize(last_conv_output, (upscaled_w, upscaled_h), cv2.INTER_CUBIC)

But the problem with this is that cv2.resize() can only deal at max with 512 channels. to tackle this I have created a for loop which basically loops every 512 channels at once and adds it to an array upsampled_last_conv_output. For me personally this approach takes ~2.5 s to complete.

Before I came up with the for loop solution I have also tried this method:

upsampled_last_conv_output_1 = cv2.resize(last_conv_output[:, :, :512], (upscaled_w, upscaled_h), cv2.INTER_CUBIC)
upsampled_last_conv_output_2 = cv2.resize(last_conv_output[:, :, 512:1024], (upscaled_w, upscaled_h), cv2.INTER_CUBIC)
upsampled_last_conv_output_3 = cv2.resize(last_conv_output[:, :, 1024:1536], (upscaled_w, upscaled_h), cv2.INTER_CUBIC)
upsampled_last_conv_output_4 = cv2.resize(last_conv_output[:, :, 1536:2048], (upscaled_w, upscaled_h), cv2.INTER_CUBIC)
upsampled_last_conv_output = np.concatenate((upsampled_last_conv_output_1, 
                                             upsampled_last_conv_output_2,
                                             upsampled_last_conv_output_3,
                                             upsampled_last_conv_output_4),
                                            axis=2)

This approach takes about ~0.9 s to complete which is way faster than the previous one, but this method looks very not python friendly (because what if we had like 1 mln channels or something like that).

So my question is: Is there a way how could I sort of combine the speed of the second method with the python like approach of the first method, or is there an even better way to deal with this problem?

CodePudding user response：

You could accumulate the arrays in a list

alist = []
for x in range(0, n_channels, 512):
    alist.append( cv2.resize(last_conv_output[:, :, x:x 512], (upscaled_w, upscaled_h), cv2.INTER_CUBIC))
upsampled_last_conv_output = np.concatenate(alist, axis=2)

I haven't tested this; I'm just trying to combine the iteration in the first case with concatenate of the second.

I'm surprised there is that big of a time difference. I suppose the upsampled_last_conv_output[:, :, x:x 512] = ... assignment could be relatively expensive.

Another idea is to add a dimension

upsampled_last_conv_output = np.zeros((upscaled_h, upscaled_w, n_channels//512, 512))

then the iteration would be a simpler:

for i in range(...):
     upsampled_last_conv_output[:,:,i,:] =

Similarly reshape last_conv_output so you can iterate on the 2nd to the last dimension.

You might also want to verify that your two approaches are doing the same number of resizes.

I haven't examined your code carefully so I may have missed details. And I haven't tested anything - obviously since you didn't provide a [mcve].

CodePudding user response：

Your first approach with a loop and preallocating memory for all results should be the fastest as it does not do an extra copy with np.concatenate. Are you sure you are measuring the time correctly?

I've made a simple code snippet that measures the execution time over multiple runs and got following results:

Elapsed time without loop: 68.31360507011414
Elapsed time with loop: 59.28367280960083

Code:


import time

import cv2
import numpy as np

n_channels = 2048
last_conv_h = 200
last_conv_w = 200
upscaled_h = last_conv_h * 3
upscaled_w = last_conv_w * 3
n_repetitions = 50

last_conv_output = np.random.uniform(size=(last_conv_h, last_conv_w, n_channels)).astype(np.float32)

upsampled_last_conv_output = np.zeros((upscaled_h, upscaled_w, n_channels))


start_time = time.time()
for _ in range(n_repetitions):
    upsampled_last_conv_output_1 = cv2.resize(last_conv_output[:, :, :512], (upscaled_w, upscaled_h), cv2.INTER_CUBIC)
    upsampled_last_conv_output_2 = cv2.resize(last_conv_output[:, :, 512:1024], (upscaled_w, upscaled_h), cv2.INTER_CUBIC)
    upsampled_last_conv_output_3 = cv2.resize(last_conv_output[:, :, 1024:1536], (upscaled_w, upscaled_h), cv2.INTER_CUBIC)
    upsampled_last_conv_output_4 = cv2.resize(last_conv_output[:, :, 1536:2048], (upscaled_w, upscaled_h), cv2.INTER_CUBIC)
    upsampled_last_conv_output = np.concatenate((upsampled_last_conv_output_1,
                                                upsampled_last_conv_output_2,
                                                upsampled_last_conv_output_3,
                                                upsampled_last_conv_output_4),
                                                axis=2)
elapsed = time.time() - start_time
print(f"Elapsed time without loop: {elapsed}")


start_time = time.time()
for _ in range(n_repetitions):
    for x in range(0, n_channels, 512):
        upsampled_last_conv_output[:, :, x:x 512] = cv2.resize(last_conv_output[:, :, x:x 512], (upscaled_w, upscaled_h), cv2.INTER_CUBIC)
elapsed = time.time() - start_time
print(f"Elapsed time with loop: {elapsed}")