I am taking a random sample of 30 data points from the standard normal distribution and plotting the resulting histogram in R. I would like to show an overlapping normal distribution that illustrates how the sample distribution is close to the population distribution. However, I can't figure out how to scale the normal curve. Here is what I have so far in R:
library(ggplot2)
n <- 30
set.seed(42)
X <- rnorm(n, mean = 0, sd = 1)
X <- as.data.frame(X)
ggplot(X, aes(x = X))
geom_histogram(bins = 6)
stat_function(fun = dnorm, args = list(
mean = 0, sd = 1
))
How do I vertically stretch the PDF of the normal distribution to account for n = 30?
CodePudding user response:
A) Using frequency as the y-axis in the histogram
I have one solution in the function rcompanion::plotNormalHistogram
n <- 30
set.seed(42)
X <- rnorm(n, mean = 0, sd = 1)
library(rcompanion)
plotNormalHistogram(X)
I think you are looking for the scenario with the default prob=FALSE
. There, I extract some information about the counts and density from the hist()
function, and use this Factor to stretch the normal curve vertically.
I don't know how to do the equivalent in ggplot2, but I would suspect that there is a way.
You can just use library(rcompanion); plotNormalHistogram
to see the code.
B) Using density as the y-axis in the histogram
library(ggplot2)
n <- 30
set.seed(42)
X <- rnorm(n, mean = 0, sd = 1)
X <- as.data.frame(X)
ggplot(X, aes(x=X))
geom_histogram(aes(y = ..density..), bins=6)
stat_function(fun = dnorm, args = list(
mean = 0, sd = 1))