I'd like extracting all values of a kernel density function to a matrix (a numpy array with shape ymax,xmax). It is very easy to plot the kernel density with seaborn:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import stats, random
x = random.sample(range(2000, 4500), 1000)
y = random.sample(range(0, 2500), 1000)
sns.kdeplot(x,y)
And it's also quite easy to extract density values at specified positions:
values = np.array(list(zip(x, y)))
kde = stats.gaussian_kde(values.T)
# Extracting kernel value at [xcoords],[ycoords]
kde([[2500,3700,4500],[500,2000,2500]])
array([3.09998436e-07, 4.63280866e-07, 1.56687705e-09])
The question is, how do I extract all possible values into a matrix (given an extension)? I want to do matrix calculations with that. If I try to use the previous method and give it the coordinates of every pixel, it's just slow.
#Fast (extracting kernel density value at every 100 xy points)
kernel(np.array(list(
itertools.product(
range(0,1000,100),
range(0,1000,100))
)).T)
# Slow (extracting kernel density value at all possible xy points)
kernel(np.array(list(
itertools.product(
range(0,1000),
range(0,1000))
)).T)
CodePudding user response:
This is slow because itertools.product
is a iterable that produces millions of pure-Python objects (integers and tuple) that needs to be decoded and translated to native integers by Numpy. You can use Numpy directly to efficiently generate such array:
rng = np.arange(1000)
x = np.repeat(rng, 1000)
y = np.tile(rng, 1000)
idx = np.hstack((x[:, None], y[:, None]))
kernel(idx)
The generation of the indices is 80 times faster on my machine.