Chi square tests for multiple columns in R-CodePudding

Here,I made data as follows:

data<-data.frame(alzheimer=c(1,1,0,1,0,0,1,0,0,0),
                 asthma=c(1,1,0,0,1,1,1,1,0,0),
                 points=c(0,1,3,5,3,2,1,2,1,5),
                 sex=c(1,1,0,0,0,0,1,1,1,0))

I want to know whether sex affects alzheimer or asthma or points. So I was considering to do chi-square test for independence. alzheimer and asthma are binary variables, so I think I can add all the numbers from sex==1 and sex==0 separately and make contingency tables to do chi-square tests. For the variable points, I don't know whether I can do chi-square test, because points is an ordinal variable ranges from 0 to 5 with only integers.

To sum up, I want to do 3 tests.

Are sex and alzheimer independent ?
Are sex and asthma independent?
Are sex and points independnet?

Additionally, in my actual data there are so many columns, so I need to know how to do many tests all in once and make it into a csv file. The csv file should include test statistics and p-values.

CodePudding user response：

We could write a function stat_test which applies a chisq.test on binary columns and a wilcox.test on the other columns (assuming they are all ordinal). We can make this function output three things.

the name of the test
the value of the statistics (stats)
the p value

Then we could use dplyr::across() to apply this test to all columms (expect the alzheimer column which is used as y input in our function). Afterwards we just add the labels as first row.

data <- data.frame(alzheimer=c(1,1,0,1,0,0,1,0,0,0),
                   asthma=c(1,1,0,0,1,1,1,1,0,0),
                   points=c(0,1,3,5,3,2,1,2,1,5),
                   sex=c(1,1,0,0,0,0,1,1,1,0))

library(dplyr)

stat_test <- function(x, y) {
  if (length(unique(na.omit(x))) > 2) {
    res <- chisq.test(x = x,
               y = y)
    label <- "chi_square"
  } else {
    res <- wilcox.test(x, y = y)
    label <- "wilcox"
  }
  
  c(
    test = label,
    stats = res$statistic,
    p_val = res$p.value
  )
}

data %>% 
  as_tibble %>% 
  summarise(across(-alzheimer,
                   ~ stat_test(.x, alzheimer))) %>% 
  mutate(label = c("test", "stats", "pvalue"), .before = 1L)
#> Warning in wilcox.test.default(x, y = y): cannot compute exact p-value with ties
#> Warning in chisq.test(x = x, y = y): Chi-squared approximation may be incorrect
#> Warning in wilcox.test.default(x, y = y): cannot compute exact p-value with ties
#> # A tibble: 3 x 4
#>   label  asthma            points            sex              
#>   <chr>  <chr>             <chr>             <chr>            
#> 1 test   wilcox            chi_square        wilcox           
#> 2 stats  60                5.13888888888889  55               
#> 3 pvalue 0.407562453620744 0.273341191458911 0.693376361757653

^{Created on 2022-09-27 by the reprex package (v2.0.1)}