Home > Software engineering >  Cannot reproduce behavior of geom_col() when multiple values per group (instead of one) are supplied
Cannot reproduce behavior of geom_col() when multiple values per group (instead of one) are supplied

Time:11-26

I came across this problem by accident. My goal was to use geom_col() to represent a single value (the sum in this case). I forgot the summarise() step at first and it gave me the first plot (see below). When I noticed that I fixed it and calculated the sum and got the second plot.

After reading a bit about it, geom_col() should indeed take the sum of the values in case multiple value per group are given, correct? However, when I compare my two plots, this doesn't seem to be the case.

It may just be something obvious that I am missing but I cannot figure it out.

library(ggplot2)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
d <- structure(list(year = c(2013, 2013, 2013, 2013, 2013, 2013, 2013, 
                             2017, 2017, 2017, 2017), score = c(0.17, 12.17, 0, 0, 0, 0.67, 
                                                                0, 7.83, 0, 14, 0), study = c("A", "A", "A", "A", "A", "A", "A", 
                                                                                              "A", "A", "A", "A")), row.names = c(NA, -11L), class = c("tbl_df", 
                                                                                                                                                       "tbl", "data.frame"))
d %>% 
  ggplot(aes(study, score, group = as.factor(year)))   
  geom_col(aes(fill = as.factor(year)), position = position_dodge())  
  scale_y_continuous(limits =  c(0,30))


d %>% 
  group_by(year, study) %>% 
  summarise(score = sum(score), .groups = "drop") %>% 
  ggplot(aes(study, score, group = as.factor(year)))   
  geom_col(aes(fill = as.factor(year)), position = position_dodge())  
  scale_y_continuous(limits =  c(0,30))

Created on 2022-11-25 with enter image description here

Notice that the bars are simply overlaid on each other.

When you say:

After reading a bit about it, geom_col() should indeed take the sum of the values in case multiple value per group are given

Can you show us where you read this?

  • Related