I am trying to create a correlation matrix between my X and Y variables and display this information in a nice figure. I am currently using ggpairs()
from the GGally
package, but if there's a better way to do this then I am happy to try something new. The figure should:
-Fit linear regression models (using lm
) between X and Y variables
-Display scatterplots with a regression line
-Display the Coefficient of the determination (R2)
-Map the colour of points/lines/R2 values by group
I have been able to do most of this, but ggpairs
only displays the correlation coefficient (r) and not the coefficient of determination (R2). I was able to use the suggestion from this post, but unfortunately the solution does not display R2 values by group.
So far:
library(GGally)
library(ggplot2)
cars <- mtcars
cars$group <- factor(c(rep("A", 16), rep("B", 16))) #adding grouping variable
#function to return R2 (coefficient of determination) and not just r (Coefficient of correlation) in the top portion of the figure
upper_fn <- function(data, mapping, ndp=2, ...){
# Extract the relevant columns as data
x <- eval_data_col(data, mapping$x)
y <- eval_data_col(data, mapping$y)
# Calculate the r^2 & format output
m <- summary(lm(y ~ x))
lbl <- paste("r^2: ", formatC(m$r.squared, digits=ndp, format="f"))
# Write out label which is centered at x&y position
ggplot(data=data, mapping=mapping)
annotate("text", x=mean(x, na.rm=TRUE), y=mean(y, na.rm=TRUE), label=lbl, parse=TRUE, ...)
theme(panel.grid = element_blank())
}
#lower function basically fits a linear model and displays points
lower_fn <- function(data, mapping, ...){
p <- ggplot(data = data, mapping = mapping)
geom_point(alpha = 0.7)
geom_smooth(method=lm, fill="blue", se = F, ...)
p
}
#The actual figure
ggpairs(cars,
columns = c(1:11),
mapping = ggplot2::aes(color = group),
upper = list(continuous = "cor", size = 15),
diag = list(continuous = "densityDiag", alpha=0.5),
lower = list(continuous = lower_fn))
CodePudding user response:
Based on Is it possible to split correlation box to show correlation values for two different treatments in pairplot?, below is a little code to get you started.
The idea is that you need to 1. split
the data over the aes
thetic variable (which is assumed to be colour
), 2. run a regression over each data subset and extract the r^2, 3. quick calculation of where to place the r^2 labels, 4. plot. Some features are left to do.
upper_fn <- function(data, mapping, ndp=2, ...){
# Extract the relevant columns as data
x <- eval_data_col(data, mapping$x)
y <- eval_data_col(data, mapping$y)
col <- eval_data_col(data, mapping$colour)
# if no colour mapping run over full data
if(is.null(col)) {
## add something here
}
# if colour aesthetic, split data and run `lm` over each group
if(!is.null(col)) {
idx <- split(seq_len(nrow(data)), col)
r2 <- unlist(lapply(idx, function(i) summary(lm(y[i] ~ x[i]))$r.squared))
lvs <- if(is.character(col)) sort(unique(col)) else levels(col)
cuts <- seq(min(y, na.rm=TRUE), max(y, na.rm=TRUE), length=length(idx) 1L)
pos <- (head(cuts, -1) tail(cuts, -1))/2
p <- ggplot(data=data, mapping=mapping, ...)
geom_blank()
theme_void()
# you could map colours to each level
annotate("text", x=mean(x), y=pos, label=paste(lvs, ": ", formatC(r2, digits=ndp, format="f")))
}
return(p)
}