I have a plot whose y-axis I need to show density instead of frequency.
This is the code I use:
ggplot(stocks_orig, aes(x=Value)) geom_histogram(aes(y=..density..), colour="black", fill="white", bins=20) geom_density(aes(y=..density..),alpha=.2, fill="lightblue", size=1)
geom_vline(aes(xintercept = -0.019), linetype = "dashed", size = 1, color = "blue") annotate("text", x =0.0, y = 51, label ="number1")
geom_vline(aes(xintercept = -0.029), linetype = "dotted", size = 1, color = "blue") annotate("text", x =-0.051, y = 25, label = "number2")
labs(title="Title", subtitle="subtitle", caption="Caption")
And this is the plot I get, which shows frequency instead, despite using aes(y=..density..)
:
This is my data:
> dput(stocks_orig[1:10,])
structure(list(Date = structure(c(14613, 14614, 14615, 14616,
14617, 14620, 14621, 14622, 14623, 14624), class = "Date", tzone = "Europe/Prague"),
Growth = c(0.0139029051689914, -0.001100605444033, -0.000800320170769155,
-0.000300045009001992, 0.00359353551013022, 0.00169855663558151,
-0.00662187630888697, 0.00836491633162767, 0.00259662584726591,
-0.00944445882799969), Medium = c(0.0181345701954827, 0.00458945233380722,
0.00159872136369707, 0.00697561373642514, 0.00409161790325356,
0.000699755114273265, -0.0108587433348759, 0.00717420374800045,
0.00119928057548219, -0.0118701725704874), Value = c(0.0273232956488904,
0.0134096869099177, 0.0061808590750811, 0.0120273802127185,
0.000499875041650993, -0.000800320170769155, -0.021938907518754,
0.0119285708652738, 0.00379279823869626, -0.0170444346092585
)), row.names = c(NA, -10L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x0000024c38fd1ef0>)
CodePudding user response:
Are you sure this isn't a density? For a curve to be a density it has to satisfy three rules (for a more mathematical explanation see e.g.
ggplot(NULL, aes(x = rnorm(5000, sd = 0.1)))
geom_density(aes(y=..density..), size = 1)
CodePudding user response:
I presume you are expecting the bin heights to add to one, and for the density curve to follow the same scaling. The default behavior is different, and is designed so that the area under the curve will total to 1. This means that for narrow x ranges, the peak density can be much higher than 1. To make it so that the total heights of the bins add to 1, you can scale the output by the bin width (which you can control more directly using binwidth
than bins
).
Compare:
ggplot(mtcars, aes((wt-3)/100))
geom_histogram(aes(y=..density..), binwidth = 1/120)
geom_density(aes(y=..density..))
and
ggplot(mtcars, aes((wt-3)/100))
geom_histogram(aes(y=..density../120), binwidth = 1/120)
geom_density(aes(y=..density../120))