Home > Software design >  How to force y-axis to show density in ggplot
How to force y-axis to show density in ggplot

Time:02-17

I have a plot whose y-axis I need to show density instead of frequency.

This is the code I use:

ggplot(stocks_orig, aes(x=Value))   geom_histogram(aes(y=..density..), colour="black", fill="white", bins=20) geom_density(aes(y=..density..),alpha=.2, fill="lightblue", size=1) 
  geom_vline(aes(xintercept = -0.019), linetype = "dashed", size = 1, color = "blue")   annotate("text", x =0.0, y = 51, label ="number1") 
  geom_vline(aes(xintercept = -0.029), linetype = "dotted", size = 1, color = "blue")   annotate("text", x =-0.051, y = 25, label = "number2")  
  labs(title="Title", subtitle="subtitle", caption="Caption")

And this is the plot I get, which shows frequency instead, despite using aes(y=..density..):

enter image description here

This is my data:

> dput(stocks_orig[1:10,])
structure(list(Date = structure(c(14613, 14614, 14615, 14616, 
14617, 14620, 14621, 14622, 14623, 14624), class = "Date", tzone = "Europe/Prague"), 
    Growth = c(0.0139029051689914, -0.001100605444033, -0.000800320170769155, 
    -0.000300045009001992, 0.00359353551013022, 0.00169855663558151, 
    -0.00662187630888697, 0.00836491633162767, 0.00259662584726591, 
    -0.00944445882799969), Medium = c(0.0181345701954827, 0.00458945233380722, 
    0.00159872136369707, 0.00697561373642514, 0.00409161790325356, 
    0.000699755114273265, -0.0108587433348759, 0.00717420374800045, 
    0.00119928057548219, -0.0118701725704874), Value = c(0.0273232956488904, 
    0.0134096869099177, 0.0061808590750811, 0.0120273802127185, 
    0.000499875041650993, -0.000800320170769155, -0.021938907518754, 
    0.0119285708652738, 0.00379279823869626, -0.0170444346092585
    )), row.names = c(NA, -10L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x0000024c38fd1ef0>)

CodePudding user response:

Are you sure this isn't a density? For a curve to be a density it has to satisfy three rules (for a more mathematical explanation see e.g. standard normal density curve

ggplot(NULL, aes(x = rnorm(5000, sd = 0.1)))  
  geom_density(aes(y=..density..), size = 1)

another normal density curve

CodePudding user response:

I presume you are expecting the bin heights to add to one, and for the density curve to follow the same scaling. The default behavior is different, and is designed so that the area under the curve will total to 1. This means that for narrow x ranges, the peak density can be much higher than 1. To make it so that the total heights of the bins add to 1, you can scale the output by the bin width (which you can control more directly using binwidth than bins).

Compare:

ggplot(mtcars, aes((wt-3)/100))  
  geom_histogram(aes(y=..density..), binwidth = 1/120)  
  geom_density(aes(y=..density..))

enter image description here

and

ggplot(mtcars, aes((wt-3)/100))  
  geom_histogram(aes(y=..density../120), binwidth = 1/120)  
  geom_density(aes(y=..density../120))

enter image description here

  • Related