ggplot: define color for point overlaps-CodePudding

With ggplot2 I want to plot two vectors (vec1_num, vec2_num) in two dimensions and colour the points by a group variable (vec3_char). Some data points are overlapping.

library(ggplot2)
vec1_num = c(1,2,3,4,1,3,4,5,5,5)
vec2_num = c(1,2,3,4,1,3,4,5,5,5)
vec3_char = c("A", "B", "C", "A", "B", "C", "C", "A", "B", "C")

# plot 1
ggplot(data = NULL)  
  geom_point(aes(x=vec1_num, y=vec2_num, colour=vec3_char), alpha=0.4, size=4)  
  scale_colour_manual(values=c("A"="darkblue", "B"="darkred", "C"="orange"))  
  theme(panel.grid = element_blank())

I know I can attenuate the overlap by reducing alpha or working with geom_jitter adding a bit of noise. Like this:

# plot 2
ggplot(data = NULL)  
  geom_jitter(aes(x=vec1_num, y=vec2_num, colour=vec3_char), alpha=0.4, size=4, width = 0.1)  
  scale_colour_manual(values=c("A"="darkblue", "B"="darkred", "C"="orange"))  
  theme(panel.grid = element_blank())

However, is it possible to make use of plot 1 but colour the overlapping points differently? So that, for example, "A" = "darkblue, "AB" = "black", "ABC" = "grey", "B" = "darkred", "BC" = "pink", "C"="orange"? And can I additionally add a small Venn Diagram (legend) that visualises the color choice for the point overlap?

Thanks!

CodePudding user response：

My way of doing this would be to convert the letters into numbers, sum them and covert back into letters.

NB The one complication is that the letters need to be A, B, D, H, ... so there is only one way of making each number combination. Though there is probably a way to start with A, B, C, ... and encode for unique values

library(tidyverse)
vec1_num = c(1,2,3,4,1,3,4,5,5,5)
vec2_num = c(1,2,3,4,1,3,4,5,5,5)
vec3_char = c("A", "B", "D", "A", "B", "D", "D", "A", "B", "D")

removeDup <- function(str) paste(rle(strsplit(str, "")[[1]])$values, collapse="") # Function to remove duplicated values in a string

data <- data.frame(x = vec1_num, y = vec2_num, col = match(vec3_char, LETTERS))

data <- data %>% 
  group_by(x) %>%
  mutate(colour = glue::glue_collapse(col, sep = "")) %>%
  select(-col) %>% 
  distinct(x, y, .keep_all = TRUE) %>% 
  mutate(colour = removeDup(colour)) %>%
  mutate(colour = sapply(str_extract_all(colour, '\\d'), function(x) sum(as.integer(x)))) %>% 
  mutate(colour = case_when(
    colour == 1 ~ "A",
    colour == 2 ~ "B",
    colour == 3 ~ "AB",
    colour == 4 ~ "D",
    colour == 5 ~ "AD",
    colour == 6 ~ "BD",
    colour == 7 ~ "ABD"
  ))

# plot 1
ggplot(data)  
  geom_point(aes(x=x, y=y, colour = as_factor(colour)), alpha=0.4, size=4)  
  geom_text(aes(x = x, y = y, label = colour), vjust = 2)  
  scale_colour_manual(values=c("A"="darkblue", "B"="darkred", "AB"="orange", "D" = "green", "AD" = "black", "BD" = "orange", "ABD" = "purple"), name = "Colour")  
  theme(panel.grid = element_blank())

CodePudding user response：

I would firstly create a dataframe. Then I would extract for every x y combination (list(df$vec1_num, df$vec2_num)) what characters are present (...unique(xy_i$vec3_char)...). Like this:

df <- data.frame(vec1_num, vec2_num, vec3_char)
df_new <- do.call("rbind.data.frame", by(df, list(df$vec1_num, df$vec2_num), function(xy_i){
chars_i <- paste0(sort(unique(xy_i$vec3_char)),collapse= "")
xy_i$chars_comb <- factor(chars_i, levels= c("A", "AB", "AC", "ABC", "B", "BC", "C"))
xy_i
}))

If you now make the plot it shows you what characters overlap at which point.

ggplot(data = df_new)  
  geom_point(aes(x=vec1_num, y=vec2_num, colour=chars_comb), alpha=0.4, size=4)   
  scale_colour_manual(values=c("AB" = "black", "ABC" = "grey", "B" = "darkred", "C"="orange", "AC"= "red"))  
  theme(panel.grid = element_blank())