Home > Software design >  Annotate points of a geom_boxplot, that fulfill specified conditions?
Annotate points of a geom_boxplot, that fulfill specified conditions?

Time:01-06

Say I have a boxplot that I created per ggplot(). And this boxplot has points above the upper whisker and below the lower whisker. If I desire to comment only a subset of those points, for example, only points, that correspond to variable values 50 and above or 5 and below. How would I do that?

EDIT

For clarification: Instead of commenting and point out, that specific points are above or below a specified threshold, I meant commenting each point individually, like labelling the points that are above and below the threshold with their respective value. So if a value like 70 is above the upper threshold of 50, I'd like the point to be annotated directly next to it with "70".

EDIT 2

Following the advice in the comments, I have encountered this problem:

Coloured points do not align with outliers per outlier-definition

As you can see, the coloured points, that are supposed to be identical to those points identified as outliers by the stat_summary() function, or in fact not identical. Some points even touch upon the whiskers.

The coloured points and the boxplots where produced like this:

# Function that enables individualizing boxplots
{
  Individualized_Boxplot_Quantiles <- function(x) {
    r <- quantile(x, probs = c(0.01, 0.25, 0.5, 0.75, 0.99))
    names(r) <- c("ymin", "lower", "middle", "upper", "ymax")
    r
  }
  
  Definition_of_Outliers = function(x) 
  {
    subset(x, 
           quantile(x,0.99) < x | quantile(x,0.01) > x)
  }
}
Data_Above_99th_Percentile = filter(Data,variable_of_interest > quantile(Data$variable_of_interest, probs = 0.99))

Data_Below_1st_Percentile = filter(Data,variable_of_interest < quantile(Data$variable_of_interest,probs = 0.01))
# creation of the individualized boxplots 
  stat_summary(fun.data = Individualized_Boxplot_Quantiles, 
               geom="boxplot",
               lwd = 0.1)  
  
  stat_summary(fun.y = Definition_of_Outliers, 
               geom="point",
               size = 0.5)   
  geom_point(data = Data_Above_99th_Percentile,
             colour = "red",
             size = 0.5)   
  
  geom_point(data = Data_Below_1st_Percentile,
             colour = "red",
             size = 0.5)

CodePudding user response:

I would overplot some points in a new geom_point layer using a distinct color by passing the appropriate subset of the data, then add text labels with the same subset.

set.seed(1)
df <- data.frame(x = 'Data', y = rnorm(1000, 26, 7))

library(ggplot2)

ggplot(df, aes(x, y))   
  geom_boxplot()  
  ylim(c(0, 60))  
  geom_point(data = subset(df, y > 50 | y < 5), color = 'red')  
  geom_text(data = subset(df, y > 50 | y < 5), aes(label = round(y, 2)),
            nudge_x = 0.08)

enter image description here

  • Related