Writing a function for pairwise t-tests-CodePudding

I'm currently attempting to write a function in R that will allow me to calculate all possible pairwise t-tests in a data frame. (I'm aware that functions exist that can achieve this, but I would also like to learn how to write the function successfully). I've ran into an issue that I don't know how to resolve.

Data:

library(combinat) # for generating pairwise combinations of variables

apple <- rnorm(100)
banana <- rnorm(100)
pear <- rnorm(100)
orange <- rnorm(100)
pineapple <- rnorm(100)


data <- data.frame(apple, banana, pear, orange, pineapple)

My idea was to use a for loop to look up every pair of column names in the table of colummn name combinations, reference the associated column numbers in the original dataset using the match function, and subsequently call the associated column names as elements in the t.test function. This process works in isolation, but I run into problems when attempting to iterate it.

combinations <- combn2(names(data)) # creates a 2x10 table of all the combinations of the 5 column names

a<-match(combinations[8,1],colnames(data))
a<-data[,a]
b<-match(combinations[8,2],colnames(data))
b<-data[,b]
t.test(a, b)

# This works as expected

Here is my attempt to automate this process using a for loop:

test <- function(data) {
  names <- names(data)
  combinations <- combinat::combn2(names(data))
  num_rows <- NROW(combinations)
  for (i in 1:num_rows) {
    x<- match(combinations[i,1],colnames(data))
    x<-data[,x]
    y<- match(combinations[i,2],colnames(data))
    y<-data[,y]
    t.test(x, y)
  }
}

test(data)
summary(test(data))

The result is empty. I'm obviously missing something, but I am not sure how to proceed. Any help is appreciated.

CodePudding user response：

The third argument of combn (not combn2) takes a function that can be applied to each combination. You can simply do

combn(data, 2L, \(d) {
  syms <- lapply(names(d), as.symbol)
  names(syms) <- c("x", "y")
  eval(bquote(t.test(.(x), .(y)), syms), d)
}, FALSE)

Output

[[1]]

    Welch Two Sample t-test

data:  apple and banana
t = -0.11531, df = 197.6, p-value = 0.9083
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.3017470  0.2684074
sample estimates:
  mean of x   mean of y 
-0.03961686 -0.02294705 


[[2]]

    Welch Two Sample t-test

data:  apple and pear
t = -0.78348, df = 197.86, p-value = 0.4343
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.3841981  0.1657171
sample estimates:
  mean of x   mean of y 
-0.03961686  0.06962364 


[[3]]

    Welch Two Sample t-test

data:  apple and orange
t = -0.55681, df = 196.65, p-value = 0.5783
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.3433412  0.1921482
sample estimates:
  mean of x   mean of y 
-0.03961686  0.03597966 


[[4]]

    Welch Two Sample t-test

data:  apple and pineapple
t = 0.038627, df = 197.99, p-value = 0.9692
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.2739606  0.2849074
sample estimates:
  mean of x   mean of y 
-0.03961686 -0.04509027 


[[5]]

    Welch Two Sample t-test

data:  banana and pear
t = -0.64848, df = 196.99, p-value = 0.5174
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.3740876  0.1889462
sample estimates:
  mean of x   mean of y 
-0.02294705  0.06962364 


[[6]]

    Welch Two Sample t-test

data:  banana and orange
t = -0.4234, df = 194.84, p-value = 0.6725
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.3334116  0.2155582
sample estimates:
  mean of x   mean of y 
-0.02294705  0.03597966 


[[7]]

    Welch Two Sample t-test

data:  banana and pineapple
t = 0.15274, df = 197.7, p-value = 0.8788
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.2637425  0.3080290
sample estimates:
  mean of x   mean of y 
-0.02294705 -0.04509027 


[[8]]

    Welch Two Sample t-test

data:  pear and orange
t = 0.25138, df = 197.38, p-value = 0.8018
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.2302948  0.2975828
sample estimates:
 mean of x  mean of y 
0.06962364 0.03597966 


[[9]]

    Welch Two Sample t-test

data:  pear and pineapple
t = 0.82024, df = 197.79, p-value = 0.4131
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.1610834  0.3905112
sample estimates:
  mean of x   mean of y 
 0.06962364 -0.04509027 


[[10]]

    Welch Two Sample t-test

data:  orange and pineapple
t = 0.59521, df = 196.45, p-value = 0.5524
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.1875381  0.3496780
sample estimates:
  mean of x   mean of y 
 0.03597966 -0.04509027 


[[1]]

    Welch Two Sample t-test

data:  apple and banana
t = -0.11531, df = 197.6, p-value = 0.9083
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.3017470  0.2684074
sample estimates:
  mean of x   mean of y 
-0.03961686 -0.02294705 


[[2]]

    Welch Two Sample t-test

data:  apple and pear
t = -0.78348, df = 197.86, p-value = 0.4343
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.3841981  0.1657171
sample estimates:
  mean of x   mean of y 
-0.03961686  0.06962364 


[[3]]

    Welch Two Sample t-test

data:  apple and orange
t = -0.55681, df = 196.65, p-value = 0.5783
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.3433412  0.1921482
sample estimates:
  mean of x   mean of y 
-0.03961686  0.03597966 


[[4]]

    Welch Two Sample t-test

data:  apple and pineapple
t = 0.038627, df = 197.99, p-value = 0.9692
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.2739606  0.2849074
sample estimates:
  mean of x   mean of y 
-0.03961686 -0.04509027 


[[5]]

    Welch Two Sample t-test

data:  banana and pear
t = -0.64848, df = 196.99, p-value = 0.5174
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.3740876  0.1889462
sample estimates:
  mean of x   mean of y 
-0.02294705  0.06962364 


[[6]]

    Welch Two Sample t-test

data:  banana and orange
t = -0.4234, df = 194.84, p-value = 0.6725
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.3334116  0.2155582
sample estimates:
  mean of x   mean of y 
-0.02294705  0.03597966 


[[7]]

    Welch Two Sample t-test

data:  banana and pineapple
t = 0.15274, df = 197.7, p-value = 0.8788
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.2637425  0.3080290
sample estimates:
  mean of x   mean of y 
-0.02294705 -0.04509027 


[[8]]

    Welch Two Sample t-test

data:  pear and orange
t = 0.25138, df = 197.38, p-value = 0.8018
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.2302948  0.2975828
sample estimates:
 mean of x  mean of y 
0.06962364 0.03597966 


[[9]]

    Welch Two Sample t-test

data:  pear and pineapple
t = 0.82024, df = 197.79, p-value = 0.4131
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.1610834  0.3905112
sample estimates:
  mean of x   mean of y 
 0.06962364 -0.04509027 


[[10]]

    Welch Two Sample t-test

data:  orange and pineapple
t = 0.59521, df = 196.45, p-value = 0.5524
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.1875381  0.3496780
sample estimates:
  mean of x   mean of y 
 0.03597966 -0.04509027 


[[1]]

    Welch Two Sample t-test

data:  apple and banana
t = -0.11531, df = 197.6, p-value = 0.9083
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.3017470  0.2684074
sample estimates:
  mean of x   mean of y 
-0.03961686 -0.02294705 


[[2]]

    Welch Two Sample t-test

data:  apple and pear
t = -0.78348, df = 197.86, p-value = 0.4343
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.3841981  0.1657171
sample estimates:
  mean of x   mean of y 
-0.03961686  0.06962364 


[[3]]

    Welch Two Sample t-test

data:  apple and orange
t = -0.55681, df = 196.65, p-value = 0.5783
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.3433412  0.1921482
sample estimates:
  mean of x   mean of y 
-0.03961686  0.03597966 


[[4]]

    Welch Two Sample t-test

data:  apple and pineapple
t = 0.038627, df = 197.99, p-value = 0.9692
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.2739606  0.2849074
sample estimates:
  mean of x   mean of y 
-0.03961686 -0.04509027 


[[5]]

    Welch Two Sample t-test

data:  banana and pear
t = -0.64848, df = 196.99, p-value = 0.5174
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.3740876  0.1889462
sample estimates:
  mean of x   mean of y 
-0.02294705  0.06962364 


[[6]]

    Welch Two Sample t-test

data:  banana and orange
t = -0.4234, df = 194.84, p-value = 0.6725
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.3334116  0.2155582
sample estimates:
  mean of x   mean of y 
-0.02294705  0.03597966 


[[7]]

    Welch Two Sample t-test

data:  banana and pineapple
t = 0.15274, df = 197.7, p-value = 0.8788
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.2637425  0.3080290
sample estimates:
  mean of x   mean of y 
-0.02294705 -0.04509027 


[[8]]

    Welch Two Sample t-test

data:  pear and orange
t = 0.25138, df = 197.38, p-value = 0.8018
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.2302948  0.2975828
sample estimates:
 mean of x  mean of y 
0.06962364 0.03597966 


[[9]]

    Welch Two Sample t-test

data:  pear and pineapple
t = 0.82024, df = 197.79, p-value = 0.4131
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.1610834  0.3905112
sample estimates:
  mean of x   mean of y 
 0.06962364 -0.04509027 


[[10]]

    Welch Two Sample t-test

data:  orange and pineapple
t = 0.59521, df = 196.45, p-value = 0.5524
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.1875381  0.3496780
sample estimates:
  mean of x   mean of y 
 0.03597966 -0.04509027

CodePudding user response：

You need to assign a reference to the output of t.test(x, y)

Try this:

test <- function(data) {
    names <- names(data)
    combinations <- combinat::combn2(names(data))
    num_rows <- nrow(combinations)
    
    test_results <- vector(mode = "list", length = num_rows)
    for (i in 1:num_rows) {
        x <- match(combinations[i,1],colnames(data))
        x <- data[,x]
        y <- match(combinations[i,2],colnames(data))
        y <- data[,y]
        test_results[[i]] <- t.test(x, y)
    }
    
    return(test_results)
}

This will provide you with a list output where each entry is a different t-test performed on a certain combination of fields, as you have requested.