I have the following dataset:
structure(list(Treatment = c("A", "A", "B", "A", "A", "A", "A",
"B", "B", "B", "B", "A", "A", "B", "A", "B", "A", "B", "A", "A",
"A", "A", "B", "B", "B", "A", "B", "B", "A", "B", "B", "B", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "B", "A", "B", "A",
"B", "A", "A", "B"), Death = c(1, 1, 1, 1, 1, 1, 0, 0, 1, 0,
1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1,
0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1), `Last observation (days)` = c(276,
212, 154, 222, 33, 299, 344, 180, 49, 324, 74, 66, 196, 269,
353, 332, 302, 211, 69, 55, 338, 103, 108, 7, 199, 64, 10, 236,
82, 242, 34, 239, 197, 315, 243, 5, 126, 44, 260, 363, 246, 193,
190, 151, 279, 215, 142, 183, 328, 119), `Age (years)` = c(92.64,
15.68, 10.39, 66.43, 79.59, 74.24, 77.06, 31.06, 11.28, 52.65,
16.66, 13.01, 42.91, 63.8, 9.99, 1.92, 33.52, 8.68, 61.97, 28.99,
86.73, 16.96, 5.8, 51.27, 21.28, 36.08, 26.12, 64.53, 52.99,
7.17, 42.37, 57.63, 83.48, 67.67, 1.12, 23.16, 81.61, 6.47, 72.69,
29.15, 73.69, 60.3, 9.21, 18.6, 34.73, 24.31, 0.37, 22.06, 9.89,
30.78)), row.names = c(NA, -50L), class = c("tbl_df", "tbl",
"data.frame"))
I would like to draw the density normality curve for each distribution (age distribution grouped by treatment) without splitting the two distributions into two different grids but by taking them into the same one.
I have used the following code, unsuccesfully:
ggplot(data, aes(x= `Age (years)`, fill = Treatment))
geom_histogram(aes(y =..density..))
stat_function(fun = dnorm, args = list(data %>% group_by(Treatment) %>% summarise(mean = mean(`Age (years)`))))
Does anyone know how to plot the two density normality curves as I asked? Thank you
CodePudding user response:
Having overlapping histograms is already quite messy, and adding probability density curves is going to make that worse, but if you really want to do it, you could try:
data %>%
group_by(Treatment) %>%
summarize(meanval = mean(`Age (years)`),
sdval = sd(`Age (years)`),
`Age (years)` = seq(min(data$`Age (years)`),
max(data$`Age (years)`), 0.1),
density = dnorm(`Age (years)`, first(meanval), first(sdval))) %>%
ggplot(aes(x= `Age (years)`, fill = Treatment))
geom_area(aes(y = density, color = after_scale(fill)), alpha = 0.5)
geom_bar(stat = 'bin', aes(y = ..density.., col = after_scale(fill)),
data = data, position = 'identity', alpha = 0.4)