Home > Software engineering >  How to apply a statistical test to several columns of a dataframe in R
How to apply a statistical test to several columns of a dataframe in R

Time:04-14

I want to apply this test, not only to column x1, as I do in this example, but to several columns of df. In this case x1 and x2.

I tried to put this code inside a function and using purrr::map but I can't do it right.

library(tidyverse)

df <- tibble(skul = c(rep('a',60), rep('b', 64)),
             x1 = sample(1:10, 124, replace = TRUE),
             x2 = sample(1:10, 124, replace = TRUE),
             i_f = c(rep(0, 30), rep(1, 30), rep(0, 32), rep(1, 32)))


lapply(split(df, factor(df$skul)),
       function(x)wilcox.test(data=x, x1 ~ i_f,
                              paired=FALSE))
#> Warning in wilcox.test.default(x = c(10L, 5L, 8L, 4L, 6L, 3L, 10L, 2L, 10L, :
#> cannot compute exact p-value with ties
#> Warning in wilcox.test.default(x = c(3L, 3L, 4L, 9L, 8L, 10L, 5L, 5L, 4L, :
#> cannot compute exact p-value with ties
#> $a
#> 
#>  Wilcoxon rank sum test with continuity correction
#> 
#> data:  x1 by i_f
#> W = 546, p-value = 0.1554
#> alternative hypothesis: true location shift is not equal to 0
#> 
#> 
#> $b
#> 
#>  Wilcoxon rank sum test with continuity correction
#> 
#> data:  x1 by i_f
#> W = 565, p-value = 0.4781
#> alternative hypothesis: true location shift is not equal to 0
Created on 2022-04-13 by the reprex package (v2.0.1)

CodePudding user response:

One way is to loop over the columns of interest as a nested inner loop after the split, create the formula with reformulate and apply the wilcox.test

out <- lapply(split(df, df$skul), function(x) 
    lapply(setNames(c("x1", "x2"), c("x1", "x2")), function(y)
      wilcox.test(reformulate("i_f", response = y), data = x)))

-output

> out$a
$x1

    Wilcoxon rank sum test with continuity correction

data:  x1 by i_f
W = 452, p-value = 0.9822
alternative hypothesis: true location shift is not equal to 0


$x2

    Wilcoxon rank sum test with continuity correction

data:  x2 by i_f
W = 404.5, p-value = 0.5027
alternative hypothesis: true location shift is not equal to 0

If we want to use tidyverse

library(dplyr)
df %>% 
   group_by(skul) %>% 
   summarise(across(c(x1, x2), 
   ~list(broom::tidy(wilcox.test(reformulate("i_f", cur_column())))))) 
  • Related