I am having difficulty successfully plotting a histogram using ggplot in R and would appreciate help on how to do this.
Some background: I have carried out a simulation in R that simulates the outbreak dynamics for an epidemic, and now I want to create a final size distribution plot over 10,000 epidemic simulations.
What I have done so far: I have simulated 10,000 outbreaks and in each of these cases I have found the number of the final size of the outbreak and saved these in f
. From typeof(f)
I get the answer double
, a small overview of f
is the following:
> tail(f)
[1] 4492 1 2 1 1 4497
I have then created a (correct) distribution plot over these with the help of the code below, but now instead want to create this using ggplot
to get a nicer histogram.
h = hist(f)
h$density = h$counts/sum(h$counts)
plot(h,freq = FALSE,
ylim = c(0,1))
My attempt: I attempted to do this on my own via the following code but I don't get a correct result. I will post the images of these two plots below where the first one is the correct one, as you can se the y-values together add up to one which is correct, and the second one is what I get using ggplot
, here the values on the y-axis is not correct. What can I do to create a graph like the first but with ggplot
instead? I am guessing that this has something to do with that I set y to be the density and for some reason it doesn't quite match.
ggplot(data=NULL, aes(x = f))
geom_histogram(aes(y = ..density..),
colour = 1, fill = "white")
The images:
CodePudding user response:
Your desired output does not have density on the y-axis, but percentages. Your ggplot has density on the y-axis, that's the default for histograms. To get the same results with ggplot
you need to use geom_histogram(aes(y=..count../sum(..count..))
CodePudding user response:
The base R function hist
calculates the optimal number of bins used to plot the frequencies. The number can be re-used in ggplot
like this:
library(ggplot2)
f <- c(4492, 1, 2, 1, 1, 4497)
h <- hist(f, freq = FALSE)
h$breaks
#> [1] 0 1000 2000 3000 4000 5000
ggplot(data = NULL, mapping = aes(x = f, y=..density..))
geom_histogram(bins = length(h$breaks) - 1)
Created on 2023-01-07 by the reprex package (v2.0.1)