Recently I started learning Probability and Statistics for Datascience. I am trying to plot Standard Deviation for the below distribution X
, like
Questions:
- What is wrong in the code plotting 1 std from mean?
- I am not able to understand why there is a small peak above the
kde
plot? - How to plot 1-std, 2-std and 3-std?
CodePudding user response:
Nothing wrong in your code: mean is
5
and std2
, so you are shading an area between5 - 2 = 3
and5 2 = 7
.There is a small peak in the
kde
plot because it is a representation of the data distribution you give withX
and, actually,X
is not a normal distribution. You can check this by using a true normal distribution:mean = 5 std = 2 X = np.random.randn(10000) X = (X - X.mean())/X.std()*std mean
You can plot other standard devaitions with a for loop over
i
.x1
is the left side,x2
is the center part (then set tonp.nan
) and finallyx3
is the right side of the distribution. Then you have to set tonp.nan
areas to exclude (which correspond tox2
):N = 10 for i in [1, 2, 3]: x1 = np.linspace(mean - i*std, mean - (i - 1)*std, N) x2 = np.linspace(mean - (i - 1)*std, mean (i - 1)*std, N) x3 = np.linspace(mean (i - 1)*std, mean i*std, N) x = np.concatenate((x1, x2, x3)) x = np.where((mean - (i - 1)*std < x) & (x < mean (i - 1)*std), np.nan, x) y = norm.pdf(x, mean, std) ax.fill_between(x, y, alpha=0.5)
Complete Code
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
import seaborn as sns
# Line width: Maximum 130 characters in the output, post which it will continue in next line.
np.set_printoptions(linewidth=130)
sns.set_context("paper", font_scale=1.5)
# Distribution
X = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 8, 8, 9]
mean = np.mean(X)
var = np.var(X)
std = np.std(X)
print("Mean:", mean)
print("Variance:", var)
print("Standard Deviation:", std)
"""
Mean: 5.0
Variance: 4.0
Standard Deviation: 2.0
"""
plt.figure(figsize=(10, 5))
ax = sns.kdeplot(X, shade=True)
N = 10
for i in [1, 2, 3]:
x1 = np.linspace(mean - i*std, mean - (i - 1)*std, N)
x2 = np.linspace(mean - (i - 1)*std, mean (i - 1)*std, N)
x3 = np.linspace(mean (i - 1)*std, mean i*std, N)
x = np.concatenate((x1, x2, x3))
x = np.where((mean - (i - 1)*std < x) & (x < mean (i - 1)*std), np.nan, x)
y = norm.pdf(x, mean, std)
ax.fill_between(x, y, alpha=0.5)
plt.xlabel("Random variable X")
plt.ylabel("Probability Density Function")
plt.xticks(ticks=range(0, 10))
plt.grid()
plt.show()