Here,I made data
as follows:
data<-data.frame(alzheimer=c(1,1,0,1,0,0,1,0,0,0),
asthma=c(1,1,0,0,1,1,1,1,0,0),
points=c(0,1,3,5,3,2,1,2,1,5),
sex=c(1,1,0,0,0,0,1,1,1,0))
I want to know whether sex
affects alzheimer
or asthma
or points
.
So I was considering to do chi-square test for independence.
alzheimer
and asthma
are binary variables, so I think I can add all the numbers from sex
==1 and sex
==0 separately and make contingency tables to do chi-square tests.
For the variable points
, I don't know whether I can do chi-square test, because points
is an ordinal variable ranges from 0 to 5 with only integers.
To sum up, I want to do 3 tests.
- Are
sex
andalzheimer
independent ? - Are
sex
andasthma
independent? - Are
sex
andpoints
independnet?
Additionally, in my actual data
there are so many columns, so I need to know how to do many tests all in once and make it into a csv file. The csv file should include test statistics and p-values.
CodePudding user response:
We could write a function stat_test
which applies a chisq.test
on binary columns and a wilcox.test
on the other columns (assuming they are all ordinal). We can make this function output three things.
- the name of the test
- the value of the statistics (stats)
- the p value
Then we could use dplyr::across()
to apply this test to all columms (expect the alzheimer
column which is used as y
input in our function). Afterwards we just add the labels as first row.
data <- data.frame(alzheimer=c(1,1,0,1,0,0,1,0,0,0),
asthma=c(1,1,0,0,1,1,1,1,0,0),
points=c(0,1,3,5,3,2,1,2,1,5),
sex=c(1,1,0,0,0,0,1,1,1,0))
library(dplyr)
stat_test <- function(x, y) {
if (length(unique(na.omit(x))) > 2) {
res <- chisq.test(x = x,
y = y)
label <- "chi_square"
} else {
res <- wilcox.test(x, y = y)
label <- "wilcox"
}
c(
test = label,
stats = res$statistic,
p_val = res$p.value
)
}
data %>%
as_tibble %>%
summarise(across(-alzheimer,
~ stat_test(.x, alzheimer))) %>%
mutate(label = c("test", "stats", "pvalue"), .before = 1L)
#> Warning in wilcox.test.default(x, y = y): cannot compute exact p-value with ties
#> Warning in chisq.test(x = x, y = y): Chi-squared approximation may be incorrect
#> Warning in wilcox.test.default(x, y = y): cannot compute exact p-value with ties
#> # A tibble: 3 x 4
#> label asthma points sex
#> <chr> <chr> <chr> <chr>
#> 1 test wilcox chi_square wilcox
#> 2 stats 60 5.13888888888889 55
#> 3 pvalue 0.407562453620744 0.273341191458911 0.693376361757653
Created on 2022-09-27 by the reprex package (v2.0.1)