Home > Software design >  Optimising Python/Numpy code used for Simulation
Optimising Python/Numpy code used for Simulation

Time:07-28

I'm working on a simulation project. For simplicity: The goal is to simulate frequency of events, then to simulate the severity of each simulated event and sum these for the time period. I'm trying to do this as a numpy array of size (periods, 2). Each row represents a time period. The first entry holds the frequency. The second entry is the summation of the severities. I'm also allowing the possibility of a cap to individual and aggregate severities. Currently, I'm using a Poisson for frequency and a Pareto for severity.

I have two working approaches, but I'm convinced there is a faster way. Currently, if I simulate 10k periods my code takes 1.3 seconds and if I simulate 100k periods, it takes 13 seconds. This may be dependent on my distribution and parameter choices, as higher frequency will mean more severities to be simulated. My biggest struggle seems to be the summation of severities within a time period / assigning that sum back to the correct frequency row.

The final goal of this project is to allow up to a million simulations for up to ten scenarios, so a bit more speed would be superb. Any tips?

Attempt 1: Iterate through rows

import numpy as np

def simulation(simNum,lambdaPoisson, alphaPareto, claimLimit=np.inf, annualLimit=np.inf):
    results = np.empty([simNum, 2])
    results[:,0] = np.random.default_rng().poisson(lambdaPoisson, simNum)
    for row in results:
        row[1] = np.random.default_rng().pareto(alphaPareto, int(row[0])).clip(max=claimLimit,min=None).sum().clip(max=annualLimit,min=None)
    print(results)
    return results

results = simulation(simNum = 100000, lambdaPoisson = 2, alphaPareto=.2, claimLimit=25, annualLimit = 100)

Attempt 2: Map a function

def mapPareto(a, b, c, d):
    out = np.random.default_rng().pareto(b, a).clip(max=c,min=None)
    return out.sum().clip(max=d,min=None)

def simulation(simNum,lambdaPoisson, alphaPareto, claimLimit=np.inf, annualLimit=np.inf):
    results = np.ones([simNum, 2],dtype=int)
    results[:,0] = np.random.default_rng().poisson(lambdaPoisson, simNum)
    results[:,1] = list(map(mapPareto, results[:,0], np.full(simNum,alphaPareto),np.full(simNum,claimLimit),np.full(simNum,annualLimit)))
    print(results)
    return results
    
results = simulation(simNum = 100000, lambdaPoisson = 2, alphaPareto=.2, claimLimit=25, annualLimit = 100)

CodePudding user response:

Idea 1: Reuse RNG

Start by reusing your rng object. Instead of

for row in results:
    row[1] = np.random.default_rng().pareto(alphaPareto, int(row[0])).clip(max=claimLimit,min=None).sum().clip(max=annualLimit,min=None)

use

for row in results:
    row[1] = rng.pareto(alphaPareto, int(row[0])).clip(max=claimLimit,min=None).sum().clip(max=annualLimit,min=None)

Before (100k): 9.19 sec

After (100k): 3.09 sec

Idea 2: Use multiple cores

Instead of

results = simulation(simNum = 1_000_000, lambdaPoisson = 2, alphaPareto=.2, claimLimit=25, annualLimit = 100)

use

def f(_):
    return simulation(simNum = 100_000, lambdaPoisson = 2, alphaPareto=.2, claimLimit=25, annualLimit = 100)
pool = multiprocessing.Pool(8)
results = pool.map(f, range(10))

Before (1M): 30.4 sec

After (1M): 9.1 sec

Idea 1 was used in both tests above

Idea 3: Use Numba

@DominikStańczak's answer explains this thoroughly.

Idea 4: Replace for with vectorized code

With some magic (I'll explain), your code can be rewritten as below:

import numpy as np

def simulation(simNum,lambdaPoisson, alphaPareto, claimLimit=np.inf, annualLimit=np.inf):
    rng = np.random.default_rng()
    psizes = rng.poisson(lambdaPoisson, simNum)
    samples = rng.pareto(alphaPareto, np.sum(psizes)   1).clip(max=claimLimit,min=None)
    slices = np.cumsum(psizes) - psizes
    results = np.add.reduceat(samples, slices).clip(max=annualLimit,min=None)
    results[psizes == 0] = 0
    print(results)
    return psizes, results

results = simulation(simNum = 100_000, lambdaPoisson = 2, alphaPareto=.2, claimLimit=25, annualLimit = 100)

Before (100k): 9.19 sec

After (100k): 0.202 sec

After (10M): 4.8 sec

The magic: I've used a flat array to create the Pareto samples for all the simulations (samples). Then I've applied the summation on the slices of the samples to find sum for different simulations. This was done using reduceat.

Gotchas:

  • An extra dummy sample is used so that reduceat will not complain in the case where psizes[-1] == 0, because in that case we'll have a value of len(samples) in slice. The effect of the dummy sample will be overriden in the line results[psizs == 0] = 0.

  • results[psizs == 0] = 0 is there to override the default action reduceat takes when two equal values in slices are present (which is to get the value from samples, instead of returning zero).

Conclusion

Both Idea 3 & 4 seem to be fast enough. Idea 4 is more compact and doesn't involve an additional library but Idea 3 results a code that is more readable and more flexible.

CodePudding user response:

Here's a numba based solution that scales nicely for your 1K goal:

import numpy as np
import numba


@numba.njit
def clip(a, minval, maxval):
    if a < minval:
        return minval
    elif a > maxval:
        return maxval
    else:
        return a


@numba.njit
def simulation3(
    simNum, lambdaPoisson, alphaPareto, claimLimit=np.inf, annualLimit=np.inf
):
    col0 = np.random.poisson(lambdaPoisson, simNum)
    col1 = np.empty(simNum, dtype=np.float64)
    for i, row in enumerate(col0):
        pareto_events_for_row = np.random.pareto(alphaPareto, row)
        clipped = [
            clip(pareto_event, maxval=claimLimit, minval=0.0)
            for pareto_event in pareto_events_for_row
        ]
        row_sum = sum(clipped)
        row_sum_clip = clip(row_sum, maxval=annualLimit, minval=0.0)
        col1[i] = row_sum_clip
    return np.vstack((col0, col1))

%timeit results = simulation3(simNum = 1_000_000, lambdaPoisson = 2, alphaPareto=.2, claimLimit=25, annualLimit = 100)
# 362 ms ± 16.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I had to write my own clip because np.clip somehow didn't agree with clipping individual floats within numba.

I initially thought you could generate col0.sum() Pareto random variables in one big 1D array, then do some fancy indexing to groupby-and-sum into the second column, but this ended up being simpler to write. I fully expect that if we see a significantly faster version, it'll be that.


A word of warning - numba and numpy don't share the same RNG (it's an ongoing issue), so even if you np.random.seed, you won't see the exact same values. This code is simple enough to fit in my brain, so I don't think there are any errors compared to what you wrote out, but as its complexity increases, it's good to keep in mind that you can only really compare the two statistically.

  • Related