Home > Software engineering >  statistical summary of scatter plots in R ggplot2 based on quadrants
statistical summary of scatter plots in R ggplot2 based on quadrants

Time:11-18

I want to plot a scatter plot with facets and quadrants - and I want to display basic statistics like the mean, median, number of points in each quadrant etc on each facet quadrant. My search lead me to stat_mean() function from ggpubr package, geom_quadrant_lines, and stat_quadrant_counts() from the ggpp package

However, with the stat_mean function I am able to print only the "mean" for the entire facet BUT not able to plot the mean for each quadrant. I am also unable to figure out the right way to get other statistics like median, correlation etc - both facet wise as well as quadrant wise.

Any help with this is highly appreciated!

library(ggplot2)
library(ggpubr)
library(ggpp)
#> 
#> Attaching package: 'ggpp'
#> The following object is masked from 'package:ggplot2':
#> 
#>     annotate

data <- data.frame(
  xlabel = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), ylabel = c(10, 12, 14, 16, 18, 6, 5, 4, 3, 2),
  facets = c("a", "a", "a", "a", "a", "b", "b", "b", "b", "b")
)


ggplot(data = data, aes(x = xlabel, y = ylabel, color = facets))  
  geom_point()  
  facet_wrap(facets ~ ., )  
  stat_mean(color = "black")  
  stat_quadrant_counts(xintercept = 3, yintercept = 9)  
  geom_quadrant_lines(xintercept = 3, yintercept = 9)

Created on 2021-11-17 by the reprex package (v2.0.1)

CodePudding user response:

Very awkwardly hidden inside this package is the function which_quadrant, which helps finding the quadrants based on your x/y coordinates and intercepts. This information can be used for simple calculations of what you call "means" (rather: centroids).

As a side, if I would be the package maintainer, I would keep the function separate and not as part of the Stat$compute_panel layer, as this is really a pain for debugging.

library(tidyverse)
library(ggpp)

data <- data.frame(xlabel = 1:10, ylabel = c(seq(10,18,2), 6:2), 
                   facets= rep(letters[1:2], each = 5))

## modified from StatQuadrantCounts$compute_panel
which_quadrant <- function(x, y, xintercept, yintercept, pool.along = "none") {
  z <- ifelse(x >= xintercept & y >= yintercept,
              1L, 
              ifelse(x >= xintercept & y < yintercept,
                     2L,
                     ifelse(x < xintercept & y < yintercept,
                            3L,
                            4L
                     )
              )
  )
  if (pool.along == "x") {
    z <- ifelse(z %in% c(1L, 4L), 1L, 2L)
  } else if (pool.along == "y") {
    z <- ifelse(z %in% c(1L, 2L), 1L, 4L)
  }
  z
}

quad_summary <- 
  data %>%
  mutate(quadrant = which_quadrant(x = xlabel, y=  ylabel, xintercept = 3, yintercept =9)) %>%
  group_by(facets, quadrant) %>%
  mutate(across(contains("label"), mean))

ggplot(data, aes(x=xlabel, y = ylabel))   
  geom_point(aes(color = facets))  
  facet_wrap(facets~.,)  
  stat_quadrant_counts(xintercept = 3, yintercept =9)  
  geom_quadrant_lines(xintercept = 3, yintercept =9)  
  geom_point(data = quad_summary, shape = 2, size = 2, aes(xlabel, ylabel))

Created on 2021-11-17 by the reprex package (v2.0.1)

  • Related