Home > Blockchain >  Plot a histogram using ggplot
Plot a histogram using ggplot

Time:01-08

I am having difficulty successfully plotting a histogram using ggplot in R and would appreciate help on how to do this.

Some background: I have carried out a simulation in R that simulates the outbreak dynamics for an epidemic, and now I want to create a final size distribution plot over 10,000 epidemic simulations.

What I have done so far: I have simulated 10,000 outbreaks and in each of these cases I have found the number of the final size of the outbreak and saved these in f. From typeof(f) I get the answer double, a small overview of f is the following:

> tail(f)
[1] 4492    1    2    1    1 4497

I have then created a (correct) distribution plot over these with the help of the code below, but now instead want to create this using ggplot to get a nicer histogram.

h = hist(f)
h$density = h$counts/sum(h$counts)
plot(h,freq = FALSE,
     ylim = c(0,1))

My attempt: I attempted to do this on my own via the following code but I don't get a correct result. I will post the images of these two plots below where the first one is the correct one, as you can se the y-values together add up to one which is correct, and the second one is what I get using ggplot, here the values on the y-axis is not correct. What can I do to create a graph like the first but with ggplot instead? I am guessing that this has something to do with that I set y to be the density and for some reason it doesn't quite match.

ggplot(data=NULL, aes(x = f))   
  geom_histogram(aes(y = ..density..),
                 colour = 1, fill = "white") 

The images:

enter image description here enter image description here

CodePudding user response:

Your desired output does not have density on the y-axis, but percentages. Your ggplot has density on the y-axis, that's the default for histograms. To get the same results with ggplot you need to use geom_histogram(aes(y=..count../sum(..count..))

CodePudding user response:

The base R function hist calculates the optimal number of bins used to plot the frequencies. The number can be re-used in ggplot like this:

library(ggplot2)

f <- c(4492,    1,    2,    1,    1,  4497)

h <- hist(f, freq = FALSE)

h$breaks
#> [1]    0 1000 2000 3000 4000 5000

ggplot(data = NULL, mapping = aes(x = f, y=..density..))  
  geom_histogram(bins = length(h$breaks) - 1)

Created on 2023-01-07 by the reprex package (v2.0.1)

  • Related