Is there an alternative to Numba for functions that use many features not supported by Numba?-CodePudding

I know Numba does not support all Python features nor all NumPy features. However I really need to speed up the execution time of the following function, which is block_reduce available in the scikit-image library (I've not downloaded the whole package, I've just taken block_reduce and view_as_blocks from it).

Here is the original code (I've just removed the examples from the docstring).

block_reduce.py

import numpy as np
from numpy.lib.stride_tricks import as_strided


def block_reduce(image, block_size, func=np.sum, cval=0):
    """
    Taken from scikit-image to avoid installation (it's very big)

    Down-sample image by applying function to local blocks.

    Parameters
    ----------
    image : ndarray
        N-dimensional input image.
    block_size : array_like
        Array containing down-sampling integer factor along each axis.
    func : callable
        Function object which is used to calculate the return value for each
        local block. This function must implement an ``axis`` parameter such
        as ``numpy.sum`` or ``numpy.min``.
    cval : float
        Constant padding value if image is not perfectly divisible by the
        block size.

    Returns
    -------
    image : ndarray
        Down-sampled image with same number of dimensions as input image.
    """

    if len(block_size) != image.ndim:
        raise ValueError("`block_size` must have the same length "
                         "as `image.shape`.")

    pad_width = []
    for i in range(len(block_size)):
        if block_size[i] < 1:
            raise ValueError("Down-sampling factors must be >= 1. Use "
                             "`skimage.transform.resize` to up-sample an "
                             "image.")
        if image.shape[i] % block_size[i] != 0:
            after_width = block_size[i] - (image.shape[i] % block_size[i])
        else:
            after_width = 0
        pad_width.append((0, after_width))

    image = np.pad(image, pad_width=pad_width, mode='constant',
                   constant_values=cval)

    blocked = view_as_blocks(image, block_size)

    return func(blocked, axis=tuple(range(image.ndim, blocked.ndim)))


def view_as_blocks(arr_in, block_shape):
    """Block view of the input n-dimensional array (using re-striding).

    Blocks are non-overlapping views of the input array.

    Parameters
    ----------
    arr_in : ndarray
        N-d input array.
    block_shape : tuple
        The shape of the block. Each dimension must divide evenly into the
        corresponding dimensions of `arr_in`.

    Returns
    -------
    arr_out : ndarray
        Block view of the input array.
    """
    if not isinstance(block_shape, tuple):
        raise TypeError('block needs to be a tuple')

    block_shape = np.array(block_shape)
    if (block_shape <= 0).any():
        raise ValueError("'block_shape' elements must be strictly positive")

    if block_shape.size != arr_in.ndim:
        raise ValueError("'block_shape' must have the same length "
                         "as 'arr_in.shape'")

    arr_shape = np.array(arr_in.shape)
    if (arr_shape % block_shape).sum() != 0:
        raise ValueError("'block_shape' is not compatible with 'arr_in'")

    # -- restride the array to build the block view
    new_shape = tuple(arr_shape // block_shape)   tuple(block_shape)
    new_strides = tuple(arr_in.strides * block_shape)   arr_in.strides

    arr_out = as_strided(arr_in, shape=new_shape, strides=new_strides)

    return arr_out

test_block_reduce.py

import numpy as np
import time
from block_reduce import block_reduce

image = np.arange(3*3*1000).reshape(3, 3, 1000)

# DO NOT REPORT THIS... COMPILATION TIME IS INCLUDED IN THE EXECUTION TIME!
start = time.time()
block_reduce(image, block_size=(3, 3, 1), func=np.mean)
end = time.time()
print("Elapsed (with compilation) = %s" % (end - start))

# NOW THE FUNCTION IS COMPILED, RE-TIME IT EXECUTING FROM CACHE
start = time.time()
block_reduce(image, block_size=(3, 3, 1), func=np.mean)
end = time.time()
print("Elapsed (after compilation) = %s" % (end - start))

I went through many issues with this code.

For example Numba does not support function type parameters. But even if I try to work around this problem by using a string for this parameter (for example func would be the string "sum" instead of np.sum) I'll fall into many more issues related to features unsupported by Numba (like np.pad, isinstance, the tuple function, etc.).

Going through each single issue turned out to be very painful. For example, I've tried to incorporate all the code for np.pad from numpy into block_reduce.py and add the numba.jit decorator to np.pad but I got additional problems.

If there is a smart way to use Numba despite all these unsupported features I would be happy with it.

Otherwise is there any alternative to Numba for that? I know there is PyPy which I've never used. If PyPy is a solution for my problem I have to highlight I just need this single script block_reduce.py to run with PyPy. The rest of the project should be run with CPython.

I was also thinking of creating a C module extension, which I've never done. But if it's worth trying I will do.

CodePudding user response：

Have you tried running detailed profiling of your code? If you are dissatisfied with the performance of your program I think it can be very helpful to use a tool such as cProfile or py-spy. This can identify bottlenecks in your program and which parts specifically need to be sped up.

That being said, as @CJR said, if your program is spending the bulk of the compute time in NumPy, there likely is no reason to worry about speeding it up using a just-in-time compiler or similar modifications to your setup. As explained in more detail here, NumPy is fast due to it implementing compute-intensive tasks in compiled languages, so it saves you from worrying about that and abstracts it away.

Depending on what exactly you are planning to do, it is possible that your efficiency could be improved by parallelism, but this is not something I would worry about yet.

To end on a more general note: while optimizing code efficiency is of course very important, it is imperative to do so carefully and deliberately. As Donald Knuth is famous for saying "premature optimization is the root of all evil (or at least most of it) in programming". See this stack exchange thread for some more discussion on this.