Home > database >  Rolling function over an array in python
Rolling function over an array in python

Time:04-26

I have an array:

a = np.array([[a,b],[c,d],[e,f],[g,h],[i,j]])

I want to apply a function that returns an array b of the same shape:

b: [[k,l],[m,n],[o,p],[q,r],[s,t]]

Each element of b would be computed considering only its equivalent and its predecessors in a. For example:

  • [k,l] would be [a,b]/1
  • [m,n] would be [a c,b d]/2
  • [o,p] would be [a c e,b d f]/3 and so on...

I've first looked functions like cumsum but the function that I need to use sometimes masks elements in the considered array (omitted this detail in the example for simplicity).

Is there an elegant way of doing this ? Rather than using a loop

CodePudding user response:

This is a cummean function but numpy does not have it. You could do:

a.cumsum(0) / np.arange(1, a.shape[0]   1)[:, None]

or even:

a.cumsum(0) / np.arange(1, a.shape[0]   1).reshape(-1, 1)

Example:

a = np.arange(10).reshape(-1, 2) 
a  
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

then running any of the above:

a.cumsum(0) / np.arange(1, a.shape[0]   1).reshape(-1, 1)

array([[0., 1.],
       [1., 2.],
       [2., 3.],
       [3., 4.],
       [4., 5.]])
    

CodePudding user response:

I don't know numpy, but I played around with this with vanilla Python, and thought I'd share my findings. I hope you find this interesting/educational, but it isn't intended to be a direct answer.

itertools has a function called accumulate (also known as scan in other functional languages). It's similar to reduce, but rather than resulting in the final value of the accumulator, it actually returns an array containing every intermediate value of the accumulator. This lets you implement this cumulative sum you were looking at:

from itertools import accumulate

input = [[1,2], [3,4], [5,6], [7,8], [9,10]]

def array_pair_accumulator(accumulator, pair):
    return [array   [element] for array, element in zip(accumulator, pair)]

cumulative_pair_arrays = list(accumulate(input, array_pair_accumulator, initial = [[], []]))
print(cumulative_pair_arrays)[1:]

I use [1:] to skip the initial element ([[], []]), which would mix up our indices later

Result:

[[[1], [2]], [[1, 3], [2, 4]], [[1, 3, 5], [2, 4, 6]], [[1, 3, 5, 7], [2, 4, 6, 8]], [[1, 3, 5, 7, 9], [2, 4, 6, 8, 10]]]

You can then use a simple list comprehension to sum over these (basically a map operation):

cumulative_sums = [[sum(array) for array in pair] for pair in cumulative_pair_arrays]
print(cumulative_sums)

result:

[[0, 0], [1, 2], [4, 6], [9, 12], [16, 20], [25, 30]]

And finally, you can do another list comprehension to calculate the mean of each:

cumulative_means = [[subtotal / (index 1) for subtotal in pair] for index, pair in enumerate(cumulative_sums)]
print(cumulative_means)

Final result:

[[1.0, 2.0], [2.0, 3.0], [3.0, 4.0], [4.0, 5.0], [5.0, 6.0]]

There's lots of room to optimize this though, you don't need so many intermediate steps. I kept them simply for illustration. Here's how it might look after you cut some fat:

from itertools import accumulate

input = [[1,2], [3,4], [5,6], [7,8], [9,10]]

def cumulative_summer(accumulator, pair):
  """The `accumulator` is expected to be a pair of numbers, to which `pair` will be added"""
  return [a b for a,b in zip(accumulator, pair)]

cumulative_sums = cumulative_sums = accumulate(input, cumulative_summer)
cumulative_means = [[subtotal / (index 1) for subtotal in pair] for index, pair in enumerate(cumulative_sums)]

print(cumulative_means)
  • Related