Home > Back-end >  Generate array / matrix of kernel density over all extension
Generate array / matrix of kernel density over all extension

Time:01-03

I'd like extracting all values of a kernel density function to a matrix (a numpy array with shape ymax,xmax). It is very easy to plot the kernel density with seaborn:

import matplotlib.pyplot as plt 
import numpy as np
import seaborn as sns
import stats, random


x = random.sample(range(2000, 4500), 1000)
y = random.sample(range(0, 2500), 1000)

sns.kdeplot(x,y)

all_kernel

And it's also quite easy to extract density values at specified positions:

values = np.array(list(zip(x, y)))
kde = stats.gaussian_kde(values.T)

# Extracting kernel value at [xcoords],[ycoords] 
kde([[2500,3700,4500],[500,2000,2500]])

array([3.09998436e-07, 4.63280866e-07, 1.56687705e-09])

The question is, how do I extract all possible values into a matrix (given an extension)? I want to do matrix calculations with that. If I try to use the previous method and give it the coordinates of every pixel, it's just slow.

#Fast      (extracting kernel density value at every 100 xy points)
kernel(np.array(list(
           itertools.product(
                range(0,1000,100),
                range(0,1000,100))
           )).T)

# Slow     (extracting kernel density value at all possible xy points)
kernel(np.array(list(
           itertools.product(
                 range(0,1000),
                 range(0,1000))
           )).T)

CodePudding user response:

This is slow because itertools.product is a iterable that produces millions of pure-Python objects (integers and tuple) that needs to be decoded and translated to native integers by Numpy. You can use Numpy directly to efficiently generate such array:

rng = np.arange(1000)
x = np.repeat(rng, 1000)
y = np.tile(rng, 1000)
idx = np.hstack((x[:, None], y[:, None]))
kernel(idx)

The generation of the indices is 80 times faster on my machine.

  • Related