Home > Software engineering >  Keeping the variable's original name in a for loop
Keeping the variable's original name in a for loop

Time:09-27

I've posted this as an additional question to this post but I thought that maybe it deserved a separated post. I have a for loop in which I make 10 different correlations.

  • I'm using the unlisted variable so that cor.test doesn't return me any errors, is there a way to keep the variable originals' name? (aka, VarA, VarB, etc) ? I've tried with the myVarn , but cor.test() won't run with that...

  • I've made a reproducible example with two tests:

### empty list:

test_list <- list()

### make two tests to provide an example:

for (a in 1:2) {
  
  myVar <- data[a]    
  myVarn <- names(myVar)    ### doesn't work with this
  data$myVarUnlist <- unlist(myVar)
    
test_list[[a]] <- cor.test(data$myVar, data$VarC, data = data)
  
}

### my list: 

test_list[[1]]:

Pearson's product-moment correlation

data:  data$myVar and data$VarC   ########## I WANTED TO KEEP the original names here
t = 244.21, df = 53, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.9992354 0.9997421
sample estimates:
     cor 
0.999556 
  • data :
structure(list(VarA = c(263L, 223L, NA, 257L, 285L, 211L, 210L, 
NA, 147L, 311L, 342L, 97L, 216L, 241L, 296L, 296L, 211L, 60L, 
339L, 318L, 358L, 167L, NA, 183L, 92L, 283L, 169L, NA, 298L, 
NA, 162L, NA, 211L, 308L, 92L, 269L, NA, 197L, 280L, 259L, 313L, 
252L, 98L, 258L, 201L, 341L, 456L, 308L, 252L, 64L, 259L, 158L, 
161L, NA, NA, 129L, 264L, NA, 216L, 109L, 91L, 236L, 275L, 254L, 
221L, NA, NA, NA, NA, NA, NA), VarB = c(145L, 120L, NA, 119L, 
142L, 132L, 100L, NA, 64L, 144L, 164L, 56L, 102L, 136L, 139L, 
135L, 91L, 32L, 123L, 164L, 145L, 93L, NA, 99L, 51L, 143L, 98L, 
NA, 158L, NA, 79L, NA, 96L, 149L, 55L, 114L, NA, 94L, 137L, 130L, 
135L, 113L, 61L, 113L, 117L, 154L, 199L, 152L, 142L, 42L, 111L, 
74L, 92L, NA, NA, 85L, 116L, NA, 99L, 64L, 60L, 114L, 151L, 136L, 
116L, NA, NA, NA, NA, NA, NA), VarC = c(145L, 121L, NA, 120L, 
145L, 133L, 101L, NA, 64L, 146L, 166L, 58L, 103L, 136L, 142L, 
135L, 91L, 34L, 123L, 167L, 148L, 93L, NA, 99L, 51L, 145L, 98L, 
NA, 159L, NA, 81L, NA, 97L, 149L, 56L, 115L, NA, 96L, 137L, 132L, 
135L, 113L, 62L, 113L, 118L, 154L, 199L, 154L, 145L, 43L, 112L, 
74L, 92L, NA, NA, 86L, 116L, NA, 100L, 66L, 60L, 114L, 153L, 
136L, 120L, NA, NA, NA, NA, NA, NA), myVarUnlist = c(145L, 120L, 
NA, 119L, 142L, 132L, 100L, NA, 64L, 144L, 164L, 56L, 102L, 136L, 
139L, 135L, 91L, 32L, 123L, 164L, 145L, 93L, NA, 99L, 51L, 143L, 
98L, NA, 158L, NA, 79L, NA, 96L, 149L, 55L, 114L, NA, 94L, 137L, 
130L, 135L, 113L, 61L, 113L, 117L, 154L, 199L, 152L, 142L, 42L, 
111L, 74L, 92L, NA, NA, 85L, 116L, NA, 99L, 64L, 60L, 114L, 151L, 
136L, 116L, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -71L), class = "data.frame")
  • edit (with two variables):
### In this case, the variables are interpolated in the dataframe, so I correlate VarA, VarB then VarC, VarD, etc... 

### this is what I usually do:
for (ii in seq(from = 1, to = 20, by = 2)) {
  
  CorrVar1 <- dfCorr1[ii 2]     #L1 Variables
  CorrnVar1 <- names(CorrVar1)
  dfCorr1$CorrVar1Unlist <- unlist(CorrVar1)
  
  CorrVar2 <- dfCorr1[ii 3]     #L2 Variables
  CorrnVar2 <- names(CorrVar2)
  dfCorr1$CorrVar2Unlist <- unlist(CorrVar2)

### i'm wondering how the as.formula() would become with two different
### variables? maybe something like this would be ok?

  myVarn1 <- names(dfCorr3)[a   2]    
  myVarn2 <- names(dfCorr3)[a   3]    

  fo <- as.formula(paste('~', myVarn2, MyVarn2))
  test_list[[a]] <- do.call('cor.test', list(fo, data = quote(dfCorr3)))
  • Thanks in advance! :)

CodePudding user response:

You may use the formula version of cor.test and do.call.

test_list <- list()

for (a in 1:2) {
  myVarn <- names(data)[a]
  fo <- as.formula(paste('~', myVarn, '  VarC'))  ## gives e.g. ~VarA   VarC
  test_list[[a]] <- do.call('cor.test', list(fo, data=quote(data)))
}

test_list
# [[1]]
# 
# Pearson's product-moment correlation
# 
# data:  VarA and VarC
# t = 20.464, df = 53, p-value < 2.2e-16
# alternative hypothesis: true correlation is not equal to 0
# 95 percent confidence interval:
#  0.9024170 0.9659991
# sample estimates:
#       cor 
# 0.9421543 
# 
# 
# [[2]]
# 
#   Pearson's product-moment correlation
# 
# data:  VarB and VarC
# t = 244.21, df = 53, p-value < 2.2e-16
# alternative hypothesis: true correlation is not equal to 0
# 95 percent confidence interval:
#   0.9992354 0.9997421
# sample estimates:
#   cor 
# 0.999556 

Actually it's easier using lapply, gives the same:

lapply(names(data)[1:2], \(x) do.call('cor.test', list(as.formula(paste('~', x, '  VarC')), data=quote(data))))

Provide further arguments: in the list:

lapply(names(data)[1:2], \(x) 
       do.call('cor.test', 
               list(as.formula(paste('~', x, '  VarC')), data=quote(data),
                    method='spearman', adjust='bonferroni')))

CodePudding user response:

cor.test() creates a list object. One of the members of that list is named data.name and is a character object which, in my example, would be automatically named "data[, x] and data$VarC".
This can be amended. Take care doing this, as it's possible to erroneously relabel the output with e.g. the wrong varibale names.

test_list <- lapply(colnames(data)[1:2],
       function(x) {
         out <- cor.test(data[,x], data$VarC)
         out$data.name <- paste0(x," and VarC")
         out
       }
)

test_list

lapply returns a list so avoids the for loop. This code assumes that VarC is always the second parameter to the correlation test.

  • Related