Home > Back-end >  Grouped matrix correlation
Grouped matrix correlation

Time:10-29

I'm trying to get correlation matrices of an arbitrary number of factors by group, ideally using dplyr. I have no problem getting the correlation matrix by filtering by group and summarizing, but using a "group_by", I'm not sure how to pass the factor data to cor.

library(dplyr)

numRows <- 20
myData <- tibble(A = rnorm(numRows),
                 B = rnorm(numRows),
                 C = rnorm(numRows),
                 Group = c(rep("Group1", numRows/2), rep("Group2", numRows/2)))

# Essentially what I'm doing is trying to get these matrices, but for all groups
myData %>% 
  filter(Group == "Group1") %>% 
  select(-Group) %>% 
  summarize(CorMat = cor(.))

# However, I don't know what to pass into "cor". The code below fails
myData %>% 
  group_by(Group) %>% 
  summarize(CorMat = cor(.))

# Error looks like this
Error: Problem with `summarise()` column `CorMat`.
i `CorMat = cor(.)`.
x 'x' must be numeric
i The error occurred in group 1: Group = "Group1".

I've seen solutions for the grouped correlation between specific factors (Correlation matrix by group) or correlations between all factors to a specific factor (Correlation matrix of grouped variables in dplyr), but nothing for a grouped correlation matrix of all factors to all factors.

CodePudding user response:

You can try using nest_by which will put you data (without Group) into a list column called data. Then you can refer to this column using cor:

myData %>% 
  nest_by(Group) %>% 
  summarise(CorMat = cor(data))

Output

  Group  CorMat[,1]   [,2]   [,3]
  <chr>       <dbl>  <dbl>  <dbl>
1 Group1      1     -0.132  0.638
2 Group1     -0.132  1     -0.284
3 Group1      0.638 -0.284  1    
4 Group2      1      0.429 -0.228
5 Group2      0.429  1     -0.235
6 Group2     -0.228 -0.235  1    

If you want a named list of matrices, you can also try the following. You can add split (or try group_split without names) and then map to remove the Group column.

library(tidyverse)

myData %>% 
  nest_by(Group) %>%
  summarise(CorMat = cor(data)) %>%
  ungroup %>%
  split(f = .$Group) %>%
  map(~ .x %>% select(-Group))

Output

$Group1
# A tibble: 3 x 1
  CorMat[,1]   [,2]   [,3]
       <dbl>  <dbl>  <dbl>
1      1     -0.132  0.638
2     -0.132  1     -0.284
3      0.638 -0.284  1    

$Group2
# A tibble: 3 x 1
  CorMat[,1]   [,2]   [,3]
       <dbl>  <dbl>  <dbl>
1      1      0.429 -0.228
2      0.429  1     -0.235
3     -0.228 -0.235  1    
  • Related