Home > Software design >  ggplot2: Colour specific ID red and other points conditional on separate variable
ggplot2: Colour specific ID red and other points conditional on separate variable

Time:10-03

I want to send participants in my cohort their specific results shown graphically. There are over a hundred participants with multiple time points, the way I hope to send them is by colouring their specific data points red and keep the other sample IDs conditional on grouping (ie. vaccine status). I know how to colour points conditional on a group but am having trouble including another condition based on specific study IDs. The code below plugged into Plotly works seamlessly but Plotly also has its issues. I prefer to use ggplot2 to produce the graphs.

Unfortunately, I cannot share the dataset so here is a brief description:

  • Sample = Study ID unique to each participant
  • Timepoint = Blood sample collection relative to vaccination dates (first or second)
  • Group = Condition describing the vaccine and infection status of each participant per timepoint (ie. Group 0 -> no vaccine ; 1 -> single dose; etc.)

Here's my code:

df %>% 
  mutate(Group = as.factor(Group)) %>% 
  mutate(Timepoint = fct_relevel(Timepoint, c("Pre-vaccine",
                                              "< 3.5 weeks after first",
                                              "3-6 weeks after first",
                                              "6-12 weeks after first", 
                                              "> 12 weeks after first",
                                              
                                              "< 3 weeks after second", 
                                              "3-6 weeks after second", 
                                              "6-12 weeks after second",
                                              "> 12 weeks after second"))) %>% 
  droplevels() %>% 
  filter(Assay == "Antibody levels" & Group %in% c(0,1,2,3,4)) %>% 
  ggplot(aes(Timepoint, Concentration))  
  geom_jitter(position = position_jitter(width =  0.0001), 
              aes(fill = ifelse(str_detect(Sample, "V1"), Sample, Group)), # Here is where I specify the colour fill of data points if they match the study ID, if not they are coloured by 'Group'
              pch = 21, 
              size = 2.5)  
  scale_y_log10(labels = scales::comma,
                  limits = c(10,10000000),
                  breaks = breaks, 
                minor_breaks = minor_breaks)  
  theme_classic() 
  labs(title = "Antibody levels",
       x = "",
       y = "Concentration (AU/ml)")  
  annotation_logticks(base = 10, sides = "l")  
  scale_fill_manual(values = pal)  
  theme(plot.title = element_text(hjust = 0.5),
        axis.text.y = element_text(face = "bold"),
        axis.text.x = element_text(angle = 45, hjust = 1, face = "bold"),
        legend.position = "none")

Figure 1. Example of colouring just by grouping. Ignores Sample ID.

Figure 2. Figure produced in Plotly showing the desired outcome. Can this be done in ggplot2?

Edit: It seems to actually be working in ggplot but the red points are masked by the other data points. Is there a way to move them to the front while minimizing the amount of code?

CodePudding user response:

I will try to help you. But without your data I won't be able to do it exactly. So I generated data that was supposed to resemble your data. Notice one small detail of the Timepoint to factor variable with the labels attribute. This is a bit different than in your data.

library(tidyverse)

n=400
TimepointLev = c("Pre-vaccine",
                 "< 3.5 weeks after first",
                 "3-6 weeks after first",
                 "6-12 weeks after first", 
                 "> 12 weeks after first",
                 "< 3 weeks after second", 
                 "3-6 weeks after second", 
                 "6-12 weeks after second",
                 "> 12 weeks after second")

df = tibble(
  Sample = 1:n,
  Group = sample(0:4, n, replace = TRUE) %>% paste() %>% factor(),
  Timepoint = sample(1:9, n, replace = TRUE) %>% paste() 
  %>% factor(labels=TimepointLev),
  Concentration = sample(1:15000, n, replace = TRUE)
)
df

output

# A tibble: 400 x 4
   Sample Group Timepoint               Concentration
    <int> <fct> <fct>                           <int>
 1      1 2     6-12 weeks after second         12021
 2      2 1     > 12 weeks after second         13608
 3      3 4     6-12 weeks after second         10417
 4      4 0     3-6 weeks after second           2545
 5      5 2     Pre-vaccine                      2167
 6      6 2     6-12 weeks after first          13725
 7      7 3     3-6 weeks after second           3367
 8      8 0     Pre-vaccine                      3900
 9      9 1     > 12 weeks after second           144
10     10 0     < 3 weeks after second           8219

Now let's prepare the graph. The data you want to highlight should be redrawn. In my case, this is data for Sample divisible by 13.

df %>% ggplot(aes(Timepoint, Concentration, fill=Group)) 
  geom_jitter(position = position_jitter(width =  0.1),pch = 21, size = 2.5) 
  geom_point(data = df %>% filter(Sample %% 13==0),
              position = position_jitter(width =  0.2), pch = 23, size = 3, 
              fill="red", color = "red")

enter image description here

Note that I applied the data filter data = df %>% filter(Sample %% 13 == 0). You make your own filter adapted to the data you have.

Finally, one more thing. I completely don't understand why you are using geom_jitter and setting position_jitter(width = 0.0001). This is completely pointless. geom_jitter is just for making the data a bit spread out so that they don't overlap. However, when you set width = 0.0001, it's like you don't use jitter at all.

CodePudding user response:

I'm going to start by saying using red as a colour to mean one thing, while already using colour to mean something else is confusing.

You could plot a red circle round the point of interest. Or add an arrow.

As suggested gghighlight may have an option for you. Or it may not.

Without wanting to redraw your whole plots for my personal preference... ...can I suggest geom_beoswarm() from ggbeeswarm may may your plots more legible in terms of understanding the data distribution.

OK So now to address the underlying question. Always tricky when we don't have a sample of your data

require(ggpolot)
require(tidyverse)
seed(42)
someData <- tibble(
    Timepoint = as.factor(rep(seq(0,8),10)),
    Concentration = sample(1:100000, 90, replace=F),
    Group = rep(seq(0,4), 18 )
) %>%
    mutate( Sample = paste0("V",ceiling(row_number()/9)))

someData %>%
    mutate(Group = as.factor(Group)) %>% 
    mutate(Timepoint = fct_recode(Timepoint,  `Pre-vaccine` = "0",
                                             "< 3.5 weeks after first" = "1",
                                             "3-6 weeks after first" = "2",
                                             "6-12 weeks after first" ="3", 
                                             "> 12 weeks after first" = "4",
                                                "< 3 weeks after second" = "5", 
                                                "3-6 weeks after second" = "6", 
                                                "6-12 weeks after second" ="7" ,
                                                "> 12 weeks after second" = "8")) -> someData

# You have defined some constants that aren't explained
pal <- c("V1" = "red", "0"= "Purple", "1" = "Blue", "2" = "Green", "3" = "Yellow", "4" = "Black", "5"="Pink")
# I've simply omitted breaks and minor_breaks from your code below

This is simply your graph working with the sample data above

someData %>%
ggplot(aes(Timepoint, Concentration))  
    geom_jitter(position = position_jitter(width =  0.0001), 
                aes(fill = ifelse(str_detect(Sample, "V1"), Sample, Group)), # Here is where I specify the colour fill of data points if they match the study ID, if not they are coloured by 'Group'
                pch = 21, 
                size = 2.5)  
    scale_y_log10(labels = scales::comma,
                  limits = c(10,10000000),
                  #breaks = breaks, 
                  #minor_breaks = minor_breaks
                  )  
    theme_classic() 
    labs(title = "Antibody levels",
         x = "",
         y = "Concentration (AU/ml)")  
    annotation_logticks(base = 10, sides = "l")  
    scale_fill_manual(values = pal)  
    theme(plot.title = element_text(hjust = 0.5),
          axis.text.y = element_text(face = "bold"),
          axis.text.x = element_text(angle = 45, hjust = 1, face = "bold"),
          legend.position = "none")

You can simply add a second geom_jitter over the first, the more recent line in the code sits above the other, and by only specifying a fill for the ones you want to highlight this achieves what you ask

someData %>%
    ggplot(aes(Timepoint, Concentration))  
    geom_jitter(position = position_jitter(width =  0.0001), 
                aes(fill = Group), 
                pch = 21, 
                size = 2.5)  
    geom_jitter(position = position_jitter(width =  0.0001), 
                aes(fill = ifelse(str_detect(Sample, "V1"), "V1", NA)), # Here is where I specify the colour fill of data points if they match the study ID, if not they are coloured by 'Group'
                pch = 21, 
                size = 2.5)      
    scale_y_log10(labels = scales::comma,
                  limits = c(10,10000000),
                  #breaks = breaks, 
                  #minor_breaks = minor_breaks
    )  
    theme_classic() 
    labs(title = "Antibody levels",
         x = "",
         y = "Concentration (AU/ml)")  
    annotation_logticks(base = 10, sides = "l")  
    scale_fill_manual(values = pal)  
    theme(plot.title = element_text(hjust = 0.5),
          axis.text.y = element_text(face = "bold"),
          axis.text.x = element_text(angle = 45, hjust = 1, face = "bold"),
          legend.position = "none")

In my opinion - better - is to leave the fills as "Groups" as you intended but to highlight the key data points

# I've added a new palette that will highlight the sample of interest
pal2 <- c("V1" = "red", "V2"= NA, "V3" = NA, "V4" = NA, "V5" = NA, 
          "V6"=NA, "V7" = NA, "V8" = NA, "V9" = NA, "V10"= NA)

someData %>%
    ggplot(aes(Timepoint, Concentration), warn)  
    geom_jitter(position = position_jitter(width =  0.0001), 
                aes(fill = Group), 
                pch = 21, 
                size = 2.5)  
    
    # You will get an error warning that some rows have missing values... thats becasue 
    # you only want to highlight some values
    # If you need to - save the plot as an object using -> gg at the end
    # and then suppressWarnings(print(gg))
    
    geom_jitter(position = position_jitter(width =  0.0001), 
                aes( color=Sample, stroke = 1, fill = NA), 
                pch = 21,
                size = 5)      
    
    scale_y_log10(labels = scales::comma,
                  limits = c(10,10000000),
                  #breaks = breaks, 
                  #minor_breaks = minor_breaks
    )  
    theme_classic() 
    labs(title = "Antibody levels",
         x = "",
         y = "Concentration (AU/ml)")  
    annotation_logticks(base = 10, sides = "l")  
    scale_fill_manual(values = pal)  
    scale_color_manual(values = pal2)  
    theme(plot.title = element_text(hjust = 0.5),
          axis.text.y = element_text(face = "bold"),
          axis.text.x = element_text(angle = 45, hjust = 1, face = "bold"),
          legend.position = "none")

enter image description here

For my own self indulgence

require(ggbeeswarm)
someData %>%
    ggplot(aes(Timepoint, Concentration))  
    geom_beeswarm( 
        cex=1.75,
                aes(fill = Group),
                pch = 21, 
                size = 2.5)  
    scale_y_log10(labels = scales::comma,
                  limits = c(10,10000000),
                  #breaks = breaks, 
                  #minor_breaks = minor_breaks
    )  
    
    geom_beeswarm( 
        cex=1.75,
        aes( color=Sample, stroke = 1, fill = NA), 
        pch = 21,
        size = 5 
    )  
    theme_classic() 
    labs(title = "Antibody levels",
         x = "",
         y = "Concentration (AU/ml)")  
    annotation_logticks(base = 10, sides = "l")  
    scale_fill_manual(values = pal)  
    scale_color_manual(values = pal2)  
    theme(plot.title = element_text(hjust = 0.5),
          axis.text.y = element_text(face = "bold"),
          axis.text.x = element_text(angle = 45, hjust = 1, face = "bold"),
          legend.position = "none")

enter image description here

  • Related