Home > Software design >  Scaling stat_function in R
Scaling stat_function in R

Time:11-11

I am taking a random sample of 30 data points from the standard normal distribution and plotting the resulting histogram in R. I would like to show an overlapping normal distribution that illustrates how the sample distribution is close to the population distribution. However, I can't figure out how to scale the normal curve. Here is what I have so far in R:

library(ggplot2)

n <- 30
set.seed(42)
X <- rnorm(n, mean = 0, sd = 1)
X <- as.data.frame(X)

ggplot(X, aes(x = X))  
  geom_histogram(bins = 6)  
  stat_function(fun = dnorm, args = list(
    mean = 0, sd = 1
  ))

How do I vertically stretch the PDF of the normal distribution to account for n = 30?

CodePudding user response:

A) Using frequency as the y-axis in the histogram

I have one solution in the function rcompanion::plotNormalHistogram

n <- 30
set.seed(42)
X <- rnorm(n, mean = 0, sd = 1)

library(rcompanion)

plotNormalHistogram(X)

I think you are looking for the scenario with the default prob=FALSE. There, I extract some information about the counts and density from the hist() function, and use this Factor to stretch the normal curve vertically.

I don't know how to do the equivalent in ggplot2, but I would suspect that there is a way.

You can just use library(rcompanion); plotNormalHistogram to see the code.

B) Using density as the y-axis in the histogram

library(ggplot2)

n <- 30
set.seed(42)
X <- rnorm(n, mean = 0, sd = 1)
X <- as.data.frame(X)

ggplot(X, aes(x=X))  
  geom_histogram(aes(y = ..density..), bins=6)  
                 stat_function(fun = dnorm, args = list(
                 mean = 0, sd = 1))
  • Related