I'm trying to get correlation matrices of an arbitrary number of factors by group, ideally using dplyr. I have no problem getting the correlation matrix by filtering by group and summarizing, but using a "group_by", I'm not sure how to pass the factor data to cor.
library(dplyr)
numRows <- 20
myData <- tibble(A = rnorm(numRows),
B = rnorm(numRows),
C = rnorm(numRows),
Group = c(rep("Group1", numRows/2), rep("Group2", numRows/2)))
# Essentially what I'm doing is trying to get these matrices, but for all groups
myData %>%
filter(Group == "Group1") %>%
select(-Group) %>%
summarize(CorMat = cor(.))
# However, I don't know what to pass into "cor". The code below fails
myData %>%
group_by(Group) %>%
summarize(CorMat = cor(.))
# Error looks like this
Error: Problem with `summarise()` column `CorMat`.
i `CorMat = cor(.)`.
x 'x' must be numeric
i The error occurred in group 1: Group = "Group1".
I've seen solutions for the grouped correlation between specific factors (Correlation matrix by group) or correlations between all factors to a specific factor (Correlation matrix of grouped variables in dplyr), but nothing for a grouped correlation matrix of all factors to all factors.
CodePudding user response:
You can try using nest_by
which will put you data (without Group
) into a list column called data
. Then you can refer to this column using cor
:
myData %>%
nest_by(Group) %>%
summarise(CorMat = cor(data))
Output
Group CorMat[,1] [,2] [,3]
<chr> <dbl> <dbl> <dbl>
1 Group1 1 -0.132 0.638
2 Group1 -0.132 1 -0.284
3 Group1 0.638 -0.284 1
4 Group2 1 0.429 -0.228
5 Group2 0.429 1 -0.235
6 Group2 -0.228 -0.235 1
If you want a named list of matrices, you can also try the following. You can add split
(or try group_split
without names) and then map
to remove the Group
column.
library(tidyverse)
myData %>%
nest_by(Group) %>%
summarise(CorMat = cor(data)) %>%
ungroup %>%
split(f = .$Group) %>%
map(~ .x %>% select(-Group))
Output
$Group1
# A tibble: 3 x 1
CorMat[,1] [,2] [,3]
<dbl> <dbl> <dbl>
1 1 -0.132 0.638
2 -0.132 1 -0.284
3 0.638 -0.284 1
$Group2
# A tibble: 3 x 1
CorMat[,1] [,2] [,3]
<dbl> <dbl> <dbl>
1 1 0.429 -0.228
2 0.429 1 -0.235
3 -0.228 -0.235 1