Home > OS >  ggplot summarizing boxplot
ggplot summarizing boxplot

Time:09-17

sadly I have a problem with my ggplot.

This is a subset of my data frame:

Name <- c('1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16', '17','18','19','20','21','22','23','24','25','26','27','28','29','30','31','32')
Gruppe <-c('A','A','B','B','C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C',  'C',  'C', 'C','A','A','B','B','C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C',  'C',  'C', 'C')
Group <-c('A','A','B','B','CA','CA','GE','GE','SA','SA','ST','ST','STR','STR','WA','WA')
Location <-c('CO','UF','CO','UF','CO','UF','CO','UF','CO','UF','CO','UF','CO','UF','CO','UF','CO','UF','CO','UF','CO','UF','CO','UF','CO','UF','CO','UF','CO','UF','CO','UF')
Value <-c(3.5,6.6,1.9,9.2,5.2,4.4,6.8,3.2,2.7,7.1,4.3,9.8,4,8.3,5,6.1,3,8.4,4.8,9.1,1.4,4,8.9,3.6,4,8.4,6.1,2.5,4.5,9.3,6.7,4.6)

data <- data.frame(Name, Gruppe, Group, Location, Value)

I would like to plot this with ggplot separated according to "Group". My Code:

ggplot(data, aes(x=Location, y=Value, fill=Group))  
  geom_boxplot() 
  scale_color_brewer(palette="Paired") 
  theme_classic() 
  scale_fill_manual(values=c("chartreuse3", "yellow2",
                             "firebrick3", "cyan4","darkgoldenrod2","darkorange4","darkgreen","deeppink3","darksalmon"))

enter image description here

However, I also want to add "C" in "Gruppe" as an additional boxplot that should show up in the legend, summarizing the "Groups" from "CA" - "WA". Is there any way to do it? Preferably without changing the dataset itself, its original form is very big. The boxplot for this should be next to "A" and "B" and look the same, ie in width.

Example pic (sry for the looks): enter image description here

Thank you for every help, if there is something missing I will try to explain.

CodePudding user response:

@Peter Oh wow, you did it! Thank you so much! I would love to use your solution exactly as it is, but in the original data set I have six locations. With facet wrap that would be a bit of a overkill. I managed to change your code a little bit:

ggplot(data1, aes(x = Location, y = PH1, fill = grp_id))  
geom_boxplot()  
theme_classic() 
scale_x_discrete(expand = expansion(add = c(0.5, 0.7))) 
scale_y_continuous(expand = expansion(mult = c(0.02, 0.08))) 
scale_color_brewer(palette = "Paired") 
scale_fill_manual(values=c("chartreuse3", "yellow2",
                         "firebrick3", "cyan4","darkgoldenrod2","darkorange4",
                         "darkgreen","deeppink3","darksalmon", "red", "green"))

CodePudding user response:

This still goes through (temporary) duplication of the data, if you want to avoid this the only method I see is precomputing the boxplots' aesthetics for the base dataset and the Gruppe == "C" subset.

geom_boxplot(
  data = . %>% union_all(., filter(., Gruppe == "C") %>% mutate(Group = "C"))
)

CodePudding user response:

Based on your sketch which helps a lot this could be a solution. The data is prepared in long format filtering out Group A and B as these are duplicated in Gruppe. The values of the combined grp variable "Group" and "Gruppe" are made into a factor so they plot in the correct order when facetted. A separate label dataframe is prepared for the Group and Gruppe ids so that the ids can be located at the maximum value of the boxplots. There input parameters of the graph are edited for visual effects to ensure there is space for the labels and that the box plots are the same width.

library(ggplot2)
library(dplyr)
library(tidyr)
library(stringr)

data1 <- 
  data %>% 
  pivot_longer(cols = c(Gruppe, Group), names_to = "grp", values_to = "grp_id") %>% 
  filter(grp != "Group" | !grp_id %in% c("A", "B")) %>% 
  mutate(grp = factor(grp, levels = c("Gruppe", "Group")))

lab_group <- 
  data1 %>%
  group_by(Location, grp_id) %>% 
  filter(Value == max(Value))


ggplot(data1, aes(x = grp_id, y = Value, fill = grp_id))  
  geom_boxplot()  
  geom_text(data = lab_group, aes(label = grp_id), vjust = -0.2, hjust = -0.2 ) 
  scale_x_discrete(expand = expansion(add = c(0.5, 0.7))) 
  scale_y_continuous(expand = expansion(mult = c(0.02, 0.08))) 
  scale_color_brewer(palette = "Paired") 
  scale_fill_manual(values=c("chartreuse3", "yellow2",
                             "firebrick3", "cyan4","darkgoldenrod2","darkorange4",
                             "darkgreen","deeppink3","darksalmon", "red", "green")) 
  facet_grid(Location~grp,
             space = "free_x",
            scales = "free_x") 
  theme_bw() 
  theme(legend.position = "none",
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        axis.title.x = element_blank())

Created on 2021-09-16 by the reprex package (v2.0.0)

  • Related