Need Help writing a Loop function-CodePudding

I have a huge dataset and created a large correlation matrix. My goal is to clean this up and create a new data frame with all the correlations greater than the abs(.25) with the variable names include. For example, I have this data set, how would I use a double nested loop over the rows and columns of the table of correlation.

a <- rnorm(10, 0 ,1)
b <- rnorm(10,1,1.5)
c <- rnorm(10,1.5,2)
d <- rnorm(10,-0.5,1)
e <- rnorm(10,-2,1)
matrix <- data.frame(a,b,c,d,e)
cor(matrix)

(notice, that there is redundancy in the matrix. You only need to inspect the first 5 columns; and you don’t need to inspect all rows. If I’m looking at column 3, for example, I only need to start looking at row 4, after the correlation = 1) Thank you

CodePudding user response：

Is your ultimate goal to create a 5x5 with all values with absolute less than 0.25 set to zero? This can be done via sapply(matrix,function(x) ifelse(x<0.25,0,x)). If your goal is to simply create a loop over the rows and columns, this can be done via:

m <- cor(matrix)
for (row in rownames(m)){
    for (col in colnames(m)){
      #your code here
      #operating on m[row,col]
    }
  }

To avoid redundancy:

for (row in rownames(m)[1:(length(rownames(m))-1)]){
    for (col in colnames(m)[(which(colnames(m) == row) 1):length(colnames(m))]){
      #your code here
      #operating on m[row,col]
      print(m[row,col])
    }
  }

CodePudding user response：

I'd suggest using the corrr package, in conjunction with tidyr and dplyr.

This allows you to generate a correlation data frame rather than a matrix and remove the duplicate values (where for example a-b is the same as b-a) using the shave function. You can then rearrange by pivoting, remove the NA values (from the diagonal, e.g. a-a) and filter for values greater than 0.25.

library(dplyr)
library(tidyr)
library(magrittr) # for the pipe %>% or just use library(tidyverse) instead of all 3

library(corrr)

# for reproducible values
set.seed(1001) 

# no need to make a data frame from vectors
# and don't call it matrix, that's a function name

mydata <- data.frame(a = rnorm(10, 0 ,1),
                     b = rnorm(10, 1, 1.5),
                     c = rnorm(10, 1.5, 2),
                     d = rnorm(10, -0.5, 1),
                     e = rnorm(10, -2, 1))

mydata %>% 
  correlate() %>% 
  shave() %>% 
  pivot_longer(2:6) %>% 
  na.omit() %>% 
  filter(abs(value) > 0.25)

Result:

# A tibble: 4 x 3
  term  name   value
  <chr> <chr>  <dbl>
1 c     b     -0.296
2 d     b      0.357
3 e     a     -0.440
4 e     d     -0.280