I've posted this as an additional question to this post but I thought that maybe it deserved a separated post. I have a for loop in which I make 10 different correlations.
I'm using the unlisted variable so that cor.test doesn't return me any errors, is there a way to keep the variable originals' name? (aka, VarA, VarB, etc) ? I've tried with the myVarn , but
cor.test()
won't run with that...I've made a reproducible example with two tests:
### empty list:
test_list <- list()
### make two tests to provide an example:
for (a in 1:2) {
myVar <- data[a]
myVarn <- names(myVar) ### doesn't work with this
data$myVarUnlist <- unlist(myVar)
test_list[[a]] <- cor.test(data$myVar, data$VarC, data = data)
}
### my list:
test_list[[1]]:
Pearson's product-moment correlation
data: data$myVar and data$VarC ########## I WANTED TO KEEP the original names here
t = 244.21, df = 53, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.9992354 0.9997421
sample estimates:
cor
0.999556
- data :
structure(list(VarA = c(263L, 223L, NA, 257L, 285L, 211L, 210L,
NA, 147L, 311L, 342L, 97L, 216L, 241L, 296L, 296L, 211L, 60L,
339L, 318L, 358L, 167L, NA, 183L, 92L, 283L, 169L, NA, 298L,
NA, 162L, NA, 211L, 308L, 92L, 269L, NA, 197L, 280L, 259L, 313L,
252L, 98L, 258L, 201L, 341L, 456L, 308L, 252L, 64L, 259L, 158L,
161L, NA, NA, 129L, 264L, NA, 216L, 109L, 91L, 236L, 275L, 254L,
221L, NA, NA, NA, NA, NA, NA), VarB = c(145L, 120L, NA, 119L,
142L, 132L, 100L, NA, 64L, 144L, 164L, 56L, 102L, 136L, 139L,
135L, 91L, 32L, 123L, 164L, 145L, 93L, NA, 99L, 51L, 143L, 98L,
NA, 158L, NA, 79L, NA, 96L, 149L, 55L, 114L, NA, 94L, 137L, 130L,
135L, 113L, 61L, 113L, 117L, 154L, 199L, 152L, 142L, 42L, 111L,
74L, 92L, NA, NA, 85L, 116L, NA, 99L, 64L, 60L, 114L, 151L, 136L,
116L, NA, NA, NA, NA, NA, NA), VarC = c(145L, 121L, NA, 120L,
145L, 133L, 101L, NA, 64L, 146L, 166L, 58L, 103L, 136L, 142L,
135L, 91L, 34L, 123L, 167L, 148L, 93L, NA, 99L, 51L, 145L, 98L,
NA, 159L, NA, 81L, NA, 97L, 149L, 56L, 115L, NA, 96L, 137L, 132L,
135L, 113L, 62L, 113L, 118L, 154L, 199L, 154L, 145L, 43L, 112L,
74L, 92L, NA, NA, 86L, 116L, NA, 100L, 66L, 60L, 114L, 153L,
136L, 120L, NA, NA, NA, NA, NA, NA), myVarUnlist = c(145L, 120L,
NA, 119L, 142L, 132L, 100L, NA, 64L, 144L, 164L, 56L, 102L, 136L,
139L, 135L, 91L, 32L, 123L, 164L, 145L, 93L, NA, 99L, 51L, 143L,
98L, NA, 158L, NA, 79L, NA, 96L, 149L, 55L, 114L, NA, 94L, 137L,
130L, 135L, 113L, 61L, 113L, 117L, 154L, 199L, 152L, 142L, 42L,
111L, 74L, 92L, NA, NA, 85L, 116L, NA, 99L, 64L, 60L, 114L, 151L,
136L, 116L, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -71L), class = "data.frame")
- edit (with two variables):
### In this case, the variables are interpolated in the dataframe, so I correlate VarA, VarB then VarC, VarD, etc...
### this is what I usually do:
for (ii in seq(from = 1, to = 20, by = 2)) {
CorrVar1 <- dfCorr1[ii 2] #L1 Variables
CorrnVar1 <- names(CorrVar1)
dfCorr1$CorrVar1Unlist <- unlist(CorrVar1)
CorrVar2 <- dfCorr1[ii 3] #L2 Variables
CorrnVar2 <- names(CorrVar2)
dfCorr1$CorrVar2Unlist <- unlist(CorrVar2)
### i'm wondering how the as.formula() would become with two different
### variables? maybe something like this would be ok?
myVarn1 <- names(dfCorr3)[a 2]
myVarn2 <- names(dfCorr3)[a 3]
fo <- as.formula(paste('~', myVarn2, MyVarn2))
test_list[[a]] <- do.call('cor.test', list(fo, data = quote(dfCorr3)))
- Thanks in advance! :)
CodePudding user response:
You may use the formula version of cor.test
and do.call
.
test_list <- list()
for (a in 1:2) {
myVarn <- names(data)[a]
fo <- as.formula(paste('~', myVarn, ' VarC')) ## gives e.g. ~VarA VarC
test_list[[a]] <- do.call('cor.test', list(fo, data=quote(data)))
}
test_list
# [[1]]
#
# Pearson's product-moment correlation
#
# data: VarA and VarC
# t = 20.464, df = 53, p-value < 2.2e-16
# alternative hypothesis: true correlation is not equal to 0
# 95 percent confidence interval:
# 0.9024170 0.9659991
# sample estimates:
# cor
# 0.9421543
#
#
# [[2]]
#
# Pearson's product-moment correlation
#
# data: VarB and VarC
# t = 244.21, df = 53, p-value < 2.2e-16
# alternative hypothesis: true correlation is not equal to 0
# 95 percent confidence interval:
# 0.9992354 0.9997421
# sample estimates:
# cor
# 0.999556
Actually it's easier using lapply
, gives the same:
lapply(names(data)[1:2], \(x) do.call('cor.test', list(as.formula(paste('~', x, ' VarC')), data=quote(data))))
Provide further arguments: in the list
:
lapply(names(data)[1:2], \(x)
do.call('cor.test',
list(as.formula(paste('~', x, ' VarC')), data=quote(data),
method='spearman', adjust='bonferroni')))
CodePudding user response:
cor.test()
creates a list object. One of the members of that list is named data.name
and is a character object which, in my example, would be automatically named "data[, x] and data$VarC"
.
This can be amended. Take care doing this, as it's possible to erroneously relabel the output with e.g. the wrong varibale names.
test_list <- lapply(colnames(data)[1:2],
function(x) {
out <- cor.test(data[,x], data$VarC)
out$data.name <- paste0(x," and VarC")
out
}
)
test_list
lapply
returns a list so avoids the for loop.
This code assumes that VarC
is always the second parameter to the correlation test.