Say I have a boxplot that I created per ggplot()
. And this boxplot has points above the upper whisker and below the lower whisker. If I desire to comment only a subset of those points, for example, only points, that correspond to variable values 50 and above or 5 and below. How would I do that?
EDIT
For clarification: Instead of commenting and point out, that specific points are above or below a specified threshold, I meant commenting each point individually, like labelling the points that are above and below the threshold with their respective value. So if a value like 70 is above the upper threshold of 50, I'd like the point to be annotated directly next to it with "70".
EDIT 2
Following the advice in the comments, I have encountered this problem:
As you can see, the coloured points, that are supposed to be identical to those points identified as outliers by the stat_summary() function, or in fact not identical. Some points even touch upon the whiskers.
The coloured points and the boxplots where produced like this:
# Function that enables individualizing boxplots
{
Individualized_Boxplot_Quantiles <- function(x) {
r <- quantile(x, probs = c(0.01, 0.25, 0.5, 0.75, 0.99))
names(r) <- c("ymin", "lower", "middle", "upper", "ymax")
r
}
Definition_of_Outliers = function(x)
{
subset(x,
quantile(x,0.99) < x | quantile(x,0.01) > x)
}
}
Data_Above_99th_Percentile = filter(Data,variable_of_interest > quantile(Data$variable_of_interest, probs = 0.99))
Data_Below_1st_Percentile = filter(Data,variable_of_interest < quantile(Data$variable_of_interest,probs = 0.01))
# creation of the individualized boxplots
stat_summary(fun.data = Individualized_Boxplot_Quantiles,
geom="boxplot",
lwd = 0.1)
stat_summary(fun.y = Definition_of_Outliers,
geom="point",
size = 0.5)
geom_point(data = Data_Above_99th_Percentile,
colour = "red",
size = 0.5)
geom_point(data = Data_Below_1st_Percentile,
colour = "red",
size = 0.5)
CodePudding user response:
I would overplot some points in a new geom_point
layer using a distinct color by passing the appropriate subset of the data, then add text labels with the same subset.
set.seed(1)
df <- data.frame(x = 'Data', y = rnorm(1000, 26, 7))
library(ggplot2)
ggplot(df, aes(x, y))
geom_boxplot()
ylim(c(0, 60))
geom_point(data = subset(df, y > 50 | y < 5), color = 'red')
geom_text(data = subset(df, y > 50 | y < 5), aes(label = round(y, 2)),
nudge_x = 0.08)