Home > Software design >  cumulative sum along given dimension from scratch
cumulative sum along given dimension from scratch

Time:02-04

I was recently given task (during exam, not funny) to create function returning cumulative sum along given dimension (input: 2d array), without use of np.cumsum ofc; to be honest i find this quite hard to even start with.

function should look like this: def cumsum_2d(array : np.ndarray, dim : int = 0) -> np.ndarray: and then result is supposed to be compared with result from actual np.cumsum

I would be grateful for even basic outline or general idea what to do.

CodePudding user response:

This approach is fairly "from scratch". It does use functools.reduce(), which I assume must be permitted.

import functools

import numpy as np

def cumsum_2d(array: np.ndarray, dim: int = 0) -> np.ndarray:
    if not isinstance(dim, int) or not 0 <= dim <= 1:
        raise ValueError('"dim": expected integer 0 or 1, got {dim}.')
    elif not array.ndim == 2:
        raise ValueError(
            f"{array.ndim} dimensional array not allowed - 2 dimensional arrays expected."
        )
    array = array.T if dim == 1 else array
    result = [
        functools.reduce(lambda x, y: x   y, array[: i   1]) for i in range(len(array))
    ]

    result = np.array(result)
    result = result.T if dim == 1 else result
    return result


a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
dim = 1
print(f"For dim = {dim} and a= \n{a}:")
print(f"...got: \n{cumsum_2d(a, dim)}")
print(f"...expected: \n{np.cumsum(a, dim)}")

This has the result:

# For dim = 1 and a= 
# [[1 2 3]
#  [4 5 6]
#  [7 8 9]]:
# ...got:
# [[ 1  3  6]
#  [ 4  9 15]
#  [ 7 15 24]]
# ...expected:
# [[ 1  3  6]
#  [ 4  9 15]
#  [ 7 15 24]]

Trying with dim = 1 raises ValueError per the function definition - this mimics the AxisError raised by np.cumsum under similar circumstances:

ValueError: "dim": expected integer 0 or 1, got 2.

Lastly, trying with a non 2-D array also raises an customised ValueError as programmed, ensuring the user doesn't get any silently passed unexpected behaviour.

b = np.array([[[1, 2, 3], [1, 2, 3]], [[4, 5, 6], [1, 2, 3]], [[7, 8, 9], [1, 2, 3]]])
cumsum_2d(b, dim)

Result:

ValueError: 3 dimensional array not allowed - 2 dimensional arrays expected.

CodePudding user response:

Here is another approach that doesn't use ufunc.accumulate or functools.reduce.

It works by inserting an extra dimension, broadcasting the array along that dimension, and then doing a sum where it only considers indices less than or equal to the current index along the summation dimension.

It's morally similar to a brute-force approach where you make a bunch of copies of the array, set the elements you don't want to zero, and then doing the sum.

import numpy as np

def cumsum_2d(array: np.ndarray, dim: int = 0):

    # Make sure the dim argument is positive
    dim = dim % array.ndim

    # Calculate the new shape with an extra copy of dim
    shape_new = list(array.shape)
    shape_new.insert(dim   1, array.shape[dim])

    # Insert the new dimension and broadcast the array along that dimension
    array = np.broadcast_to(np.expand_dims(array, dim   1), shape_new)

    # Save the indices of the array
    indices = np.indices(array.shape)

    # Sum along the requested dimension, considering only the elements less than the current index
    return np.sum(array, axis=dim, where=indices[dim] <= indices[dim   1])

a = np.random.random((4, 5))

assert np.array_equal(cumsum_2d(a, 1), np.cumsum(a, 1))
assert np.array_equal(cumsum_2d(a, 0), np.cumsum(a, 0))
assert np.array_equal(cumsum_2d(a, -1), np.cumsum(a, -1))
assert np.array_equal(cumsum_2d(a, -2), np.cumsum(a, -2))

Note that this function should work for arrays of any rank, not just two-dimensional ones.

  • Related