cumulative sum along given dimension from scratch-CodePudding

I was recently given task (during exam, not funny) to create function returning cumulative sum along given dimension (input: 2d array), without use of np.cumsum ofc; to be honest i find this quite hard to even start with.

function should look like this: def cumsum_2d(array : np.ndarray, dim : int = 0) -> np.ndarray: and then result is supposed to be compared with result from actual np.cumsum

I would be grateful for even basic outline or general idea what to do.

CodePudding user response：

This approach is fairly "from scratch". It does use functools.reduce(), which I assume must be permitted.

import functools

import numpy as np

def cumsum_2d(array: np.ndarray, dim: int = 0) -> np.ndarray:
    if not isinstance(dim, int) or not 0 <= dim <= 1:
        raise ValueError('"dim": expected integer 0 or 1, got {dim}.')
    elif not array.ndim == 2:
        raise ValueError(
            f"{array.ndim} dimensional array not allowed - 2 dimensional arrays expected."
        )
    array = array.T if dim == 1 else array
    result = [
        functools.reduce(lambda x, y: x   y, array[: i   1]) for i in range(len(array))
    ]

    result = np.array(result)
    result = result.T if dim == 1 else result
    return result


a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
dim = 1
print(f"For dim = {dim} and a= \n{a}:")
print(f"...got: \n{cumsum_2d(a, dim)}")
print(f"...expected: \n{np.cumsum(a, dim)}")

This has the result:

# For dim = 1 and a= 
# [[1 2 3]
#  [4 5 6]
#  [7 8 9]]:
# ...got:
# [[ 1  3  6]
#  [ 4  9 15]
#  [ 7 15 24]]
# ...expected:
# [[ 1  3  6]
#  [ 4  9 15]
#  [ 7 15 24]]

Trying with dim = 1 raises ValueError per the function definition - this mimics the AxisError raised by np.cumsum under similar circumstances:

ValueError: "dim": expected integer 0 or 1, got 2.

Lastly, trying with a non 2-D array also raises an customised ValueError as programmed, ensuring the user doesn't get any silently passed unexpected behaviour.

b = np.array([[[1, 2, 3], [1, 2, 3]], [[4, 5, 6], [1, 2, 3]], [[7, 8, 9], [1, 2, 3]]])
cumsum_2d(b, dim)

Result:

ValueError: 3 dimensional array not allowed - 2 dimensional arrays expected.

CodePudding user response：

Here is another approach that doesn't use ufunc.accumulate or functools.reduce.

It works by inserting an extra dimension, broadcasting the array along that dimension, and then doing a sum where it only considers indices less than or equal to the current index along the summation dimension.

It's morally similar to a brute-force approach where you make a bunch of copies of the array, set the elements you don't want to zero, and then doing the sum.

import numpy as np

def cumsum_2d(array: np.ndarray, dim: int = 0):

    # Make sure the dim argument is positive
    dim = dim % array.ndim

    # Calculate the new shape with an extra copy of dim
    shape_new = list(array.shape)
    shape_new.insert(dim   1, array.shape[dim])

    # Insert the new dimension and broadcast the array along that dimension
    array = np.broadcast_to(np.expand_dims(array, dim   1), shape_new)

    # Save the indices of the array
    indices = np.indices(array.shape)

    # Sum along the requested dimension, considering only the elements less than the current index
    return np.sum(array, axis=dim, where=indices[dim] <= indices[dim   1])

a = np.random.random((4, 5))

assert np.array_equal(cumsum_2d(a, 1), np.cumsum(a, 1))
assert np.array_equal(cumsum_2d(a, 0), np.cumsum(a, 0))
assert np.array_equal(cumsum_2d(a, -1), np.cumsum(a, -1))
assert np.array_equal(cumsum_2d(a, -2), np.cumsum(a, -2))

Note that this function should work for arrays of any rank, not just two-dimensional ones.