Tweaking ggpairs() or a better solution to a correlation matrix-CodePudding

I am trying to create a correlation matrix between my X and Y variables and display this information in a nice figure. I am currently using ggpairs() from the GGally package, but if there's a better way to do this then I am happy to try something new. The figure should:

-Fit linear regression models (using lm) between X and Y variables

-Display scatterplots with a regression line

-Display the Coefficient of the determination (R2)

-Map the colour of points/lines/R2 values by group

I have been able to do most of this, but ggpairs only displays the correlation coefficient (r) and not the coefficient of determination (R2). I was able to use the suggestion from this post, but unfortunately the solution does not display R2 values by group.

So far:

library(GGally)
library(ggplot2)

cars <- mtcars
cars$group <- factor(c(rep("A", 16), rep("B", 16))) #adding grouping variable

#function to return R2 (coefficient of determination) and not just r (Coefficient of correlation) in the top portion of the figure
upper_fn <- function(data, mapping, ndp=2, ...){
  
  # Extract the relevant columns as data
  x <- eval_data_col(data, mapping$x)
  y <- eval_data_col(data, mapping$y)
  
  # Calculate the r^2 & format output
  m <- summary(lm(y ~ x))
  lbl <- paste("r^2: ", formatC(m$r.squared, digits=ndp, format="f"))
  
  # Write out label which is centered at x&y position
  ggplot(data=data, mapping=mapping)   
    annotate("text", x=mean(x, na.rm=TRUE), y=mean(y, na.rm=TRUE), label=lbl, parse=TRUE, ...) 
    theme(panel.grid = element_blank()) 
}


#lower function basically fits a linear model and displays points 
lower_fn <- function(data, mapping, ...){
  p <- ggplot(data = data, mapping = mapping)   
    geom_point(alpha = 0.7)   
    geom_smooth(method=lm, fill="blue", se = F, ...)
  p
}

#The actual figure
  ggpairs(cars,
    columns = c(1:11),
    mapping = ggplot2::aes(color = group),
    upper = list(continuous = "cor", size = 15),
    diag = list(continuous = "densityDiag", alpha=0.5),
    lower = list(continuous = lower_fn))

CodePudding user response：

Based on Is it possible to split correlation box to show correlation values for two different treatments in pairplot?, below is a little code to get you started.

The idea is that you need to 1. split the data over the aesthetic variable (which is assumed to be colour), 2. run a regression over each data subset and extract the r^2, 3. quick calculation of where to place the r^2 labels, 4. plot. Some features are left to do.

upper_fn <- function(data, mapping, ndp=2, ...){
  
  # Extract the relevant columns as data
  x <- eval_data_col(data, mapping$x)
  y <- eval_data_col(data, mapping$y)
  col <- eval_data_col(data, mapping$colour)

  # if no colour mapping run over full data
  if(is.null(col)) {
        ## add something here
    }

  # if colour aesthetic, split data and run `lm` over each group
  if(!is.null(col)) {
    idx <- split(seq_len(nrow(data)), col)
    r2 <- unlist(lapply(idx, function(i) summary(lm(y[i] ~ x[i]))$r.squared))

    lvs <- if(is.character(col)) sort(unique(col)) else levels(col)
    cuts <- seq(min(y, na.rm=TRUE), max(y, na.rm=TRUE), length=length(idx) 1L)
    pos <- (head(cuts, -1)   tail(cuts, -1))/2 

    p <- ggplot(data=data, mapping=mapping, ...)   
            geom_blank()   
            theme_void()   
            # you could map colours to each level 
            annotate("text", x=mean(x), y=pos, label=paste(lvs, ": ", formatC(r2, digits=ndp, format="f")))
    }
  
  return(p)
}