Home > Net >  How to embed the number of observations into violin plots?
How to embed the number of observations into violin plots?

Time:08-25

I want to put data on facets of violin plots and annotate these violins with the number of observations used to plot the violin.

Here is an example of what I have without observation counts:

library(ggplot2)
library(dplyr)
library(tidyverse)

data("iris")

c <- rep(c('r', 'g', 'b'), 50)
c <- sample(c)
facet_row <- rep(c('row1', 'row2', 'row3', 'row4', 'row5'), 30)
facet_col <- rep(c('col1', 'col2', 'col3'), 50)

iris$facet_rows <- facet_row
iris$facet_cols <- facet_col
iris$color <- c
iris$count <- sample(1:10, size = 150, replace = T)

p <- ggplot(iris, aes(x=Species, y=Petal.Length, fill=color))   
  geom_violin(alpha = 0.7, na.rm = T)  
  coord_flip()  
  facet_grid(rows = vars(facet_rows), cols = vars(facet_cols))

print(p)

Result: enter image description here

I want to put the number of observations right behind those violins. I tried this so far:

count_data <- function (y){
  df <- data.frame(y = min(y) - 0.2, label = length(y))
  return(df)
}

p <- ggplot(iris, aes(x=Species, y=Petal.Length, fill=color))   
  geom_violin(alpha = 0.7, na.rm = T)   stat_summary(fun.data = count_data, geom = "text", aes(group = Species))  
  coord_flip()  
  facet_grid(rows = vars(facet_rows), cols = vars(facet_cols))


print(p)

This produces an output with an issue: enter image description here

Grouped violins now have one count value. The problem is that those violins most definetly will have different number of observations.

I have tried to just draw a geom_text using precomputed number of observations (assume that iris$count actually contains observation counts that will have the same value for different rows, but random here):

p <- ggplot(iris, aes(x=Species, y=Petal.Length, fill=color))   
  geom_violin(alpha = 0.7, na.rm = T)   geom_text(aes(label=count, y=Petal.Length), nudge_y = -0.1)  
  coord_flip()  
  facet_grid(rows = vars(facet_rows), cols = vars(facet_cols))

print(p)

This has a similar problem with the previous approach: enter image description here

  1. It has values for two violins in the same group in one line.
  2. Each violin repeats the number of observations once for each observation.

I am relatively new to R, I feel like there is a clean way to do this, but I can't figure it out...

CodePudding user response:

Removing the explicit grouping and putting position_dodge resolved the issue:

count_data <- function (y){
  df <- data.frame(y = min(y) - 0.2, label = length(y))
  return(df)
}

p <- ggplot(iris, aes(x=Species, y=Petal.Length, fill=color))   
  geom_violin(alpha = 0.7, na.rm = T)   stat_summary(fun.data = count_data, geom = "text", position = position_dodge(1))  
  coord_flip()  
  facet_grid(rows = vars(facet_rows), cols = vars(facet_cols))


print(p)
  • Related