Home > Enterprise >  Taking multiple slices of numpy 1d array from given indices, copying result into 2d array
Taking multiple slices of numpy 1d array from given indices, copying result into 2d array

Time:09-06

New to Python. Given in the code snippet below is a numpy 1d array called randomWalk. Given indices (which can be interpreted as start dates and end dates, both of which may vary from item to item), I want to do take multiple slices from that 1d array randomWalk and arrange the results in a 2d array of given shape.

I am trying to vectorize this. Was able to select the slices I wanted from the 1d array using np.r_, but failed to store these in the format I require for the output (a 2d array with rows representing items and columns representing time from min(startDates) to max(endDates).

Below is the (ugly) code that works.

import numpy as np

numItems = 20
numPeriods = 12

# Data
randomWalk = np.random.normal(loc = 0.0, scale = 0.05, size = (numPeriods,))
startDates = np.random.randint(low = 1, high = 5, size = numItems)
endDates = np.random.randint(low = 5, high = numPeriods   1, size = numItems)
stochasticItems = np.random.choice([False, True], size=(numItems,), p = [0.9, 0.1])

# Result needs to be in this shape (code snippet is designed to capture that only
# a relatively small fraction of resultMatrix's elements will differ from unity) 
resultMatrix = np.ones((numItems, numPeriods))

# Desired result (obtained via brute force)
for i in range(numItems):
    if stochasticItems[i]:
        resultMatrix[
            i, startDates[i]:endDates[i]] = np.cumprod(randomWalk[startDates[i]:endDates[i]]   1.0)

CodePudding user response:

Inspired by @mozway 's answer, convert irregular slices into regular mask array:

>>> # build all arrays with np.random.seed(0)
>>> x = np.arange(numPeriods)
>>> mask = (startDates[:, None] <= x) & (endDates[:, None] > x)
>>> result = np.where(mask & stochasticItems[:, None], np.where(mask, randomWalk   1, 1).cumprod(-1), 1)
>>> np.allclose(result, resultMatrix)
True
>>> result
array([[1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.0489369 , 1.16646468, 1.2753867 ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ],
       [1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        ]])

CodePudding user response:

If the vectorization is the goal, so it is done by Pig answer, If it is not matter (as it is mentioned by the OP in the comments --> The aim is improvement in performance), so I suggest using numba library to accelerate the code. We can write np.cumprod equivalent numba code and accelerate it using numba no-python jit:

@nb.njit
def nb_cumprod(arr):
    y = np.empty_like(arr)
    y[0] = arr[0]
    for i in range(1, arr.shape[0]):
        y[i] = arr[i] * y[i-1]
    return y


@nb.njit
def nb_(numItems, numPeriods, stochasticItems, startDates, endDates, randomWalk):
    resultMatrix = np.ones((numItems, numPeriods))

    for i in range(numItems):
        if stochasticItems[i]:
            resultMatrix[i, startDates[i]:endDates[i]] = nb_cumprod(randomWalk[startDates[i]:endDates[i]]   1.0)
    return resultMatrix

This code improved the code ~10 times faster than the OP in my some benchmarks.

  • Related