Barplot with sorted dots-CodePudding

I want to plot a 4 group barplot from a first data-frame called df1 and display dots from another data-frame called df2. The idea is to check how many dots from df2 lie outside of df1.

So I made the following graph which works well.

### 0- Import package
library(dplyr)
    
### 1- Data simulation
set.seed(4)
df1 <- data.frame(var=c("a", "b", "c", "d"), value=c(15, 19, 18, 17))
df2 <- data.frame(var1=rep(c("a", "b", "c", "d"), each=20), value=rnorm(80, 15, 2), color=NA, fill=NA)

### 2- Coloring data (outside=red, inside=blue)
df2$fill <- case_when(
  (df2$var1=="a" & df2$value>subset(df1, var=='a')$value) ~ "#e18b8b", 
  (df2$var1=="b" & df2$value>subset(df1, var=='b')$value) ~ "#e18b8b", 
  (df2$var1=="c" & df2$value>subset(df1, var=='c')$value) ~ "#e18b8b", 
  (df2$var1=="d" & df2$value>subset(df1, var=='d')$value) ~ "#e18b8b", 
  TRUE ~ "#8cbee2")
df2$color <- case_when(
  (df2$var1=="a" & df2$value>subset(df1, var=='a')$value) ~ "#ca0d0d", 
  (df2$var1=="b" & df2$value>subset(df1, var=='b')$value) ~ "#ca0d0d", 
  (df2$var1=="c" & df2$value>subset(df1, var=='c')$value) ~ "#ca0d0d", 
  (df2$var1=="d" & df2$value>subset(df1, var=='d')$value) ~ "#ca0d0d", 
  TRUE ~ "#0c78ca")

### 3- Display plot
ggplot(aes(x=var, y=value), data=df1)   geom_bar(stat="identity", fill='#8cbee2', width=0.6)   
  geom_point(data=df2, aes(x=var1, y=value), colour=df2$color, fill=df2$fill, position=position_jitter(width=0.05, height=0), shape=21, size=2)

In order to improve this graph, I would like to order dots from df2 displayed within each barplot group, kind of qqplot-shaped. -First, this would allow to tell whether the amount of dots outside is huge or not compared to barplots. -Second, this would allow to see distribution of inside & outside dots.

I have found the following link but it only deals with one data-frame and I am working with 2.

Do you have any clue on how to sort these dots?

CodePudding user response：

One option to achieve your desired result would be to use position_dodge and a helper column. To this end first order your data by var1 and value, then add the helper column as an interaction of var1 and the row index or number. This helper column could then be mapped on the group aes to ensure that points are plotted in ascending order where the dodge gives the qqplot-like shape:

Note: I also used a different approach for the colors which uses a left_join and maps on the color and fill aes.

library(dplyr)

set.seed(4)
df1 <- data.frame(var = c("a", "b", "c", "d"), value = c(15, 19, 18, 17))
df2 <- data.frame(var1 = rep(c("a", "b", "c", "d"), each = 20), value = rnorm(80, 15, 2), color = NA, fill = NA)

df2 <- df2 %>%
  left_join(df1, by = c("var1" = "var"), suffix = c("", "_df1")) %>%
  arrange(var1, value) %>%
  mutate(
    var_dodge = interaction(var1, row_number()),
    color = value > value_df1
  )

library(ggplot2)

ggplot(aes(x = var, y = value), data = df1)  
  geom_bar(stat = "identity", fill = "#8cbee2", width = 0.6)  
  geom_point(
    data = df2, aes(x = var1, y = value, group = var_dodge, color = color, fill = color),
    position = position_dodge(width = .4), shape = 21, size = 2
  )  
  scale_color_manual(values = c("TRUE" = "#ca0d0d", "FALSE" = "#0c78ca"))  
  scale_fill_manual(values = c("TRUE" = "#e18b8b", "FALSE" = "#8cbee2"))  
  guides(fill = "none", color = "none")