I already have k mean vectors and covariance matrix, and weights, how can I implement it in python to sample n samples from that mixed Gaussian distribution? I can pretty much implement it for the case where the mean and covariance are one-dimensional, but how do I implement it and draw a graph for the case where the mean is multi-dimensional? Thank you in advance for your answer.enter image description here
CodePudding user response:
You can generate samples from a mixture Gaussian distribution in a 2-step approach. You first (randomly) select the mixture component according to the mixture weights and subsequently generate a sample from this particular mixtures.
An example code can be found below. It uses a bivariate example for easy visualisation. Also, do compare the printed text and the plot with the selected weights, mean vectors, and covariance matrices. I hope this helps.
import matplotlib.pyplot as plt
import numpy as np
import random
# Bivariate example
dim = 2
# Settings
n = 500
NumberOfMixtures = 3
# Mixture weights (non-negative, sum to 1)
w = [0.5, 0.25, 0.25]
# Mean vectors and covariance matrices
MeanVectors = [ [0,0], [-5,5], [5,5] ]
CovarianceMatrices = [ [[1, 0], [0, 1]], [[1, .8], [.8, 1]], [[1, -.8], [-.8, 1]] ]
# Initialize arrays
samples = np.empty( (n,dim) ); samples[:] = np.NaN
componentlist = np.empty( (n,1) ); componentlist[:] = np.NaN
# Generate samples
for iter in range(n):
# Get random number to select the mixture component with probability according to mixture weights
DrawComponent = random.choices(range(NumberOfMixtures), weights=w, cum_weights=None, k=1)[0]
# Draw sample from selected mixture component
DrawSample = np.random.multivariate_normal(MeanVectors[DrawComponent], CovarianceMatrices[DrawComponent], 1)
# Store results
componentlist[iter] = DrawComponent
samples[iter, :] = DrawSample
# Report fractions
print('Fraction of mixture component 0:', np.sum(componentlist==0)/n)
print('Fraction of mixture component 1:',np.sum(componentlist==1)/n)
print('Fraction of mixture component 2:',np.sum(componentlist==2)/n)
# Visualize result
plt.plot(samples[:, 0], samples[:, 1], '.', alpha=0.5)
plt.grid()
plt.show()