Suppose I have a vector like so:
s = pd.Series(range(50))
The rolling sum over, let's say a 2-element window is easily calculated:
s.rolling(window=2, min_periods=2).mean()
0 NaN
1 0.5
2 1.5
3 2.5
4 3.5
5 4.5
6 5.5
7 6.5
8 7.5
9 8.5
...
Now I don't want to take the adjacent 2 elements for the window, but I want to take e.g. every third element. Still only take the last 2 of them. It would result in this vector:
0 NaN
1 NaN
2 NaN
3 1.5 -- (3 0)/2
4 2.5 -- (4 1)/2
5 3.5 -- (5 2)/2
6 4.5 -- ...
7 5.5
8 6.5
9 7.5
...
How can I achieve this efficiently?
Thanks!
CodePudding user response:
use stride parameter in the numpy.ndarray.strides attribute, which allows you to specify the number of bytes to step in each dimension when traversing an array.
import numpy as np
arr = np.arange(10)
strided = np.lib.stride_tricks.as_strided(arr, shape=(len(arr)//3, 3), strides=(3*arr.itemsize, arr.itemsize))
result = np.mean(strided[:, -2:], axis=1)
output:
array([1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5])
CodePudding user response:
This is not directly possible with rolling
.
A workaround would be:
out = s.add(s.shift(3)).div(2)
Otherwise you need to use the underlying numpy array (see @John's answer)