Home > Software engineering >  obtain bootstrap correlation interval between one variable and all others in data frame
obtain bootstrap correlation interval between one variable and all others in data frame

Time:06-05

I would like to get bootstrap correlation parameters (interval and rho) between one variable (v) and all others (x,y,z). If I was only interested in the correlation itself (not boostrap) I would use the following formula.

df[,-1] %>% map(.,~cor.test(df$v, .))

How can we use a similar idea with bootstrap?

v<-rnorm(20,50,2)
x<-rnorm(20,2)
y<-rnorm(20,10,2)
z<-rnorm(20,10,2)

df<-data.frame(v=v,x=x,y=y,z=z)

CodePudding user response:

Write a function to run the tests like in the one run case. Then extract the estimate of rho and the confidence intervals. The function returns these values.

Use package boot to run the above R times, in the code below 5 times.

set.seed(2022)
v<-rnorm(20,50,2)
x<-rnorm(20,2)
y<-rnorm(20,10,2)
z<-rnorm(20,10,2)

df<-data.frame(v=v,x=x,y=y,z=z)

library(magrittr)
library(boot)

boot_cor <- function(data, i) {
  d <- data[i, ]
  cor_list <- d[-1] %>% purrr::map(~cor.test(v, .))
  pval <- sapply(cor_list, `[[`, 'estimate')
  conf.int <- sapply(cor_list, `[[`, 'conf.int')
  out <- rbind(pval, conf.int)
  out
}

b <- boot(df, boot_cor, R = 5)
b$t
#>            [,1]        [,2]      [,3]        [,4]        [,5]       [,6]        [,7]       [,8]      [,9]
#> [1,] 0.10901729 -0.35040802 0.5261551  0.47729667  0.04408784 0.75941797 -0.28443711 -0.6456858 0.1808467
#> [2,] 0.17520115 -0.28978685 0.5732758 -0.01138502 -0.45163044 0.43331887  0.20238775 -0.2637551 0.5918977
#> [3,] 0.45951068  0.02132649 0.7496046 -0.42373326 -0.72947047 0.02312342 -0.09795514 -0.5180212 0.3601784
#> [4,] 0.08274794 -0.37344763 0.5067140 -0.26406702 -0.63265841 0.20206624 -0.32949464 -0.6737737 0.1323194
#> [5,] 0.18779791 -0.27781014 0.5819556  0.11815477 -0.34226148 0.53281673  0.16289648 -0.3013469 0.5647101

# maybe more readable
array(b$t, dim = c(5, 3, 3),
      dimnames = list(NULL, c("rho", "lower", "upper"), names(df[-1])))
#> , , x
#> 
#>             rho       lower     upper
#> [1,] 0.10901729 -0.35040802 0.5261551
#> [2,] 0.17520115 -0.28978685 0.5732758
#> [3,] 0.45951068  0.02132649 0.7496046
#> [4,] 0.08274794 -0.37344763 0.5067140
#> [5,] 0.18779791 -0.27781014 0.5819556
#> 
#> , , y
#> 
#>              rho       lower      upper
#> [1,]  0.47729667  0.04408784 0.75941797
#> [2,] -0.01138502 -0.45163044 0.43331887
#> [3,] -0.42373326 -0.72947047 0.02312342
#> [4,] -0.26406702 -0.63265841 0.20206624
#> [5,]  0.11815477 -0.34226148 0.53281673
#> 
#> , , z
#> 
#>              rho      lower     upper
#> [1,] -0.28443711 -0.6456858 0.1808467
#> [2,]  0.20238775 -0.2637551 0.5918977
#> [3,] -0.09795514 -0.5180212 0.3601784
#> [4,] -0.32949464 -0.6737737 0.1323194
#> [5,]  0.16289648 -0.3013469 0.5647101

Created on 2022-06-04 by the reprex package (v2.0.1)

  • Related