New to Python. Given in the code snippet below is a numpy 1d array called randomWalk. Given indices (which can be interpreted as start dates and end dates, both of which may vary from item to item), I want to do take multiple slices from that 1d array randomWalk and arrange the results in a 2d array of given shape.
I am trying to vectorize this. Was able to select the slices I wanted from the 1d array using np.r_
, but failed to store these in the format I require for the output (a 2d array with rows representing items and columns representing time from min(startDates)
to max(endDates)
.
Below is the (ugly) code that works.
import numpy as np
numItems = 20
numPeriods = 12
# Data
randomWalk = np.random.normal(loc = 0.0, scale = 0.05, size = (numPeriods,))
startDates = np.random.randint(low = 1, high = 5, size = numItems)
endDates = np.random.randint(low = 5, high = numPeriods 1, size = numItems)
stochasticItems = np.random.choice([False, True], size=(numItems,), p = [0.9, 0.1])
# Result needs to be in this shape (code snippet is designed to capture that only
# a relatively small fraction of resultMatrix's elements will differ from unity)
resultMatrix = np.ones((numItems, numPeriods))
# Desired result (obtained via brute force)
for i in range(numItems):
if stochasticItems[i]:
resultMatrix[
i, startDates[i]:endDates[i]] = np.cumprod(randomWalk[startDates[i]:endDates[i]] 1.0)
CodePudding user response:
Inspired by @mozway 's answer, convert irregular slices into regular mask array:
>>> # build all arrays with np.random.seed(0)
>>> x = np.arange(numPeriods)
>>> mask = (startDates[:, None] <= x) & (endDates[:, None] > x)
>>> result = np.where(mask & stochasticItems[:, None], np.where(mask, randomWalk 1, 1).cumprod(-1), 1)
>>> np.allclose(result, resultMatrix)
True
>>> result
array([[1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. ],
[1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. ],
[1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. ],
[1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. ],
[1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. ],
[1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. ],
[1. , 1. , 1.0489369 , 1.16646468, 1.2753867 ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. ],
[1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. ],
[1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. ],
[1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. ],
[1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. ],
[1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. ],
[1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. ],
[1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. ],
[1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. ],
[1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. ],
[1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. ],
[1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. ],
[1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. ],
[1. , 1. , 1. , 1. , 1. ,
1. , 1. , 1. , 1. , 1. ,
1. , 1. ]])
CodePudding user response:
If the vectorization is the goal, so it is done by Pig answer, If it is not matter (as it is mentioned by the OP in the comments --> The aim is improvement in performance), so I suggest using numba library to accelerate the code. We can write np.cumprod
equivalent numba code and accelerate it using numba no-python jit:
@nb.njit
def nb_cumprod(arr):
y = np.empty_like(arr)
y[0] = arr[0]
for i in range(1, arr.shape[0]):
y[i] = arr[i] * y[i-1]
return y
@nb.njit
def nb_(numItems, numPeriods, stochasticItems, startDates, endDates, randomWalk):
resultMatrix = np.ones((numItems, numPeriods))
for i in range(numItems):
if stochasticItems[i]:
resultMatrix[i, startDates[i]:endDates[i]] = nb_cumprod(randomWalk[startDates[i]:endDates[i]] 1.0)
return resultMatrix
This code improved the code ~10 times
faster than the OP in my some benchmarks.