I came across this problem by accident. My goal was to use geom_col()
to represent a single value (the sum in this case). I forgot the summarise()
step at first and it gave me the first plot (see below). When I noticed that I fixed it and calculated the sum and got the second plot.
After reading a bit about it, geom_col()
should indeed take the sum of the values in case multiple value per group are given, correct? However, when I compare my two plots, this doesn't seem to be the case.
It may just be something obvious that I am missing but I cannot figure it out.
library(ggplot2)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
d <- structure(list(year = c(2013, 2013, 2013, 2013, 2013, 2013, 2013,
2017, 2017, 2017, 2017), score = c(0.17, 12.17, 0, 0, 0, 0.67,
0, 7.83, 0, 14, 0), study = c("A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A")), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame"))
d %>%
ggplot(aes(study, score, group = as.factor(year)))
geom_col(aes(fill = as.factor(year)), position = position_dodge())
scale_y_continuous(limits = c(0,30))
d %>%
group_by(year, study) %>%
summarise(score = sum(score), .groups = "drop") %>%
ggplot(aes(study, score, group = as.factor(year)))
geom_col(aes(fill = as.factor(year)), position = position_dodge())
scale_y_continuous(limits = c(0,30))
Notice that the bars are simply overlaid on each other.
When you say:
After reading a bit about it,
geom_col()
should indeed take the sum of the values in case multiple value per group are given
Can you show us where you read this?