Home > database >  how to connect the means with a line within a single category in ggplot
how to connect the means with a line within a single category in ggplot

Time:06-01

Here is a dummy code :

library(ggplot2)
library(dplyr)

diamonds |> dplyr::filter(color %in% c("D","E", "F"), cut %in% c("Ideal","Fair"), clarity %in% c("SI2","VS2","IF")) |> ggplot(aes(x = clarity, y =carat,  color=color, shape=cut))  
stat_summary(fun.data= mean_cl_boot, geom="errorbar", width=0.05, position=position_dodge(0.7))  
stat_summary(fun=mean, geom="point", size=2, position= position_dodge(0.7))

I would like to connect the means with a line within each clarity category ( ie connect circle to the triangle: shown in red colour on the picture as an example):

enter image description here

If I use geom_stat or geom_line: it gives an error that geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic? which makes sense since both of them are within a single clarity group. I tried to use group=interaction() but it did not work either, I only were able to do it for points within different clarity groups

CodePudding user response:

I think best to use a manual dodge

library(ggplot2)
library(dplyr)

df <- diamonds %>% dplyr::filter(color %in% c("D","E", "F"), cut %in% c("Ideal","Fair"), clarity %in% c("SI2","VS2","IF")) 

## make a names vector for your manual dodge 
## this of course needs adjustment depending on your actual data. can be automated
dodge_vec <- seq(-.25, .25, length = 6)
names(dodge_vec) <- unique(with(df, paste(cut, color, sep = "_")))

## some data alterations - assign dodge by subsetting with named vector
df <- df %>%
  mutate(cut_col = dodge_vec[paste(cut, color, sep = "_")]) 
## summarise for your lines 
df_line <- 
  df %>%
  group_by(clarity, cut, color, cut_col) %>%
  summarise(mean_carat = mean(carat))
#> `summarise()` has grouped output by 'clarity', 'cut', 'color'. You can override
#> using the `.groups` argument.

## need to pass your original x as an integer and add your new doding column
ggplot(df, aes(x = as.integer(factor(clarity))   cut_col, y =carat, color=color, shape=cut))  
stat_summary(fun.data= mean_cl_boot, geom="errorbar", width=0.05)  
  stat_summary(fun=mean, geom="point", size=2)  
  ## add lines with your new data, using an interaction variable
  geom_line(data = df_line, aes(y = mean_carat, group = interaction( as.integer(clarity), color)))  
  scale_x_continuous(breaks = 1:3, labels = unique(df$clarity))
#> Warning: Using shapes for an ordinal variable is not advised

Your question suggests that you're dealing with paired data, therefore my suggestion in the comment. I wanted to give an example, but the diamond data set doesn't have paired data, thus it would be a bit difficult to fake that.

Created on 2022-05-31 by the reprex package (v2.0.1)

  • Related