How to make a relative frequency normal distribution?-CodePudding

Ok so basically I have to plot a relative frequency histogram (which I've done) but I also have to plot a normal distribution curve over it. And no matter how I do it it's always for absolute frequency and not relative freqency.

This is what I have so far:

set.seed(1099)

N <- 1520
n_1 <- 4
n_2 <- 30
n_3 <- 76
Valor_esperado = (8   12)/2
Variancia = (12-8)^2/12

Amostra_1 <- matrix( runif(N*n_1,min = 8,max = 12)
             , nrow = n_1)

Amostra_2 <- matrix( runif(N*n_2,min = 8,max = 12)
, nrow = n_2)

Amostra_3 <- matrix( runif(N*n_3,min = 8,max = 12)
, nrow = n_3)


media_1 <- colMeans(Amostra_1)
media_2 <- colMeans(Amostra_2)
media_3 <- colMeans(Amostra_3)


Amostra_1 <- as.numeric(unlist(media_1))
Amostra_2 <- as.numeric(unlist(media_2))
Amostra_3 <- as.numeric(unlist(media_3))

#par(mfrow=c(2,2))

h <- hist(Amostra_1, plot=FALSE)
h$density = h$counts/sum(h$counts) * 100
plot(h, main="n = 4",
     xlab = NULL,
     ylab="Frequência Relativa",
     col="blue",
     freq=FALSE)


h <- hist(Amostra_2, plot=FALSE)
h$density = h$counts/sum(h$counts) * 100
plot(h, main="n = 30",
     xlab = NULL,
     ylab="Frequência Relativa",
     col="red",
     freq=FALSE)

h <- hist(Amostra_3, plot=FALSE)
h$density = h$counts/sum(h$counts) * 100
plot(h, main="n = 76",
     xlab = NULL,
     ylab="Frequência Relativa",
     col="yellow",
     freq=FALSE)

CodePudding user response：

Given the histogram you've defined, you need a Gaussian curve that integrates to (100*binwidth) rather than 1. This should do it (for example):

binwidth <- diff(h$breaks)[1]
curve(dnorm(x, mean = mean(Amostra_1), 
            sd = sd(Amostra_1)) * binwidth*100, 
      add = TRUE)

In this particular case the top of the curve gets clipped because the y-axis for the histogram is only based on the bar heights (bin densities), not considering the peak of the theoretical curve. The simple/crude way to fix this would be to add ylim = c(0, max(h$density)*1.1) when plotting your histogram, to extend the maximum a bit (one "correct", slightly more annoying way is to compute max(h$density), compute dnorm(0, ...)*binwidth*100 (the max value of the theoretical curve), and use the maximum of these two values when setting ylim).