Use numpy vectorize or map to speed up a loop - Python NumPy 3D matrix "get rid of a loop"-CodePudding

Now I have 1 loop that populates a 3D NumPy matrix. I'm not exactly the best at understanding a 3D array structure even though I know it's really just a XxYxZ representation of the normal XxY that I'm used to thinking in (2D). So if you want to know what this is it is a Brownian Bridge (BB) construction used in Monte Carlo simulations for financial problems. Credit for the original code (derived from the commentary which fixed the original post by author Kenta Oono located here): https://gist.github.com/delta2323/6bb572d9473f3b523e6e. You don't really need to know anything about the math behind it; it just basically chops up a path of steps (21 in this example), begins at 0, has normally distributed shocks (hence np.random.randn) applied until it reaches the end, which is also 0. Each path is applied to a simulated price to randomly "shock it" over time, generating a potential path the asset could follow on its way to expiration. Although these are totally uncorrelated, so I suppose I would pass a V matrix in as well to correlate the paths to be correct, however, let us keep it simple:

import numpy as np
from matplotlib import pyplot
import timeit

steps = 21
underlyings = 3
sims = 131072

seed = 0 # fix the seed for replicating results
np.random.seed(seed)

def sample_path_batches(underlyings, steps, sims):
    dt = 1.0 / (steps-1)
    dt_sqrt = np.sqrt(dt)
    B = np.empty((underlyings, steps, sims), dtype=float)
    B[:,0, :] = 0 # set first step to 0
    for n in range(steps - 2):
        t = n * dt
        xi = np.random.randn(underlyings, sims) * dt_sqrt
        B[:, n   1, :] = B[:, n, :] * (1 - dt / (1 - t))   xi
        B[:, -1, :] = 0 # set last step to 0
    return B

start_time = timeit.default_timer()
B = sample_path_batches(underlyings, steps, sims)
print('\n'   'Run time for ', sims, ' simulation steps * underlyings: ', 
np.round((timeit.default_timer() - start_time),3), ' seconds')

pyplot.plot(B[:,:,np.random.randint(0,sims)].T); # plot a random simulation set of paths
pyplot.show()

Run time for 131072 simulation steps * underlyings: 2.014 seconds

So anyhow, that's way too slow for my application, although my original version with a 2nd inner loop was around 15 seconds. So I've seen where people have vectorized NumPy through np.vectorize or used maps to "flatten" a loop, but I can't visualize how to actually do it myself. I'm looking for an optimal "native Python" implementation that will produce the same numbers. B is the 3D NumPy array. You can just copy and paste it and run it online if you want: https://mybinder.org/v2/gh/jupyterlab/jupyterlab-demo/HEAD?urlpath=lab/tree/demo

Any suggestions are appreciated!!! Even if it is just "restructure the loop like this, then apply np.vectorize" or whatever, I'm pretty good at taking a suggestion and making it work off a simple "new view" into how to visualize the problem. I would usually just do this type of thing in Cython (nogil / OpenMP / prange) but I'd like to know to "flatten" a loop in general, with normal math libraries built into NumPy or Pandas or whatever works.

CodePudding user response：

One simple solution to speed up this code is to parallelize it using Numba. You only need to use the decorator @nb.njit('float64[:,:,::1](int64, int64, int64)', parallel=True) for the function sample_path_batches (where nb is the Numba module). Note that dtype=float must be replaced with dtype=np.float64 in the function so that Numba can compile the code correctly. Note that parallel=True should automatically parallelize the np.random.randn call as well as the basic following operation in the loop. On a 10-core machine this is 7 times faster (it takes 0.253 second with Numpy and 0.036 with a parallel implementation of Numba). If you do not see any improvement, you could also try to parallelize it manually using prange.

Additionally, you can use np.float32 types for significantly faster performance (up to 2 times faster theoretically). However, Numpy do not currently support such types for np.random.randn. Instead, np.random.default_rng().random(size=underlyings*sims, dtype=np.float32).reshape(underlyings, sims) should be used. Unfortunately, it is probably not yet supported by Numba since Numpy add this quite recently...

If you have an Nvidia GPU, another solution is to use CUDA to execute the function on the GPU. This should be much faster. Note that Numba have specific optimized functions to generate random np.float32 values on the GPU using CUDA (see here).