How do I do fishers test for each column in a dataframe?-CodePudding

I have a dataframe with multiple columns, and I need to do a fishers test for each column. I know that I first need to change each column into a 2x2 contingency table and then do a fishers test on it and then loop through the entire dataframe (containing 1000 columns).

For instance, the dataframe will look like this, and I think I should convert it to this form inorder to do the fishers test.

Is there any way to do this? Is there a way to do the fishers test without having to convert it into a contingency table? I'm stuck on how to do fishers test for this dataframe..thank you very much!

CodePudding user response：

First provide the data in reproducible format with dput():

mydata <- structure(list(Sample1 = c(40, 2, 30, 0), Sample2 = c(50, 5, 
60, 0), Sample3 = c(60, 0, 65, 5), Sample4 = c(86, 0, 35, 3)), class = "data.frame", row.names = c("high-nonmutation", 
"high-mutation", "low-nonmutation", "low-mutation"))

Now re-arrange the rows so they match your table:

mydata <- mydata[c(2, 1, 4, 3), ]

Finally use apply() to run the Fisher's tests:

results <- apply(mydata, 2, function(x) fisher.test(matrix(x, 2, 2)))

The object results is a list. Each list part is a test:

results[[1]]   # or results[["Sample1"]]
# 
#   Fisher's Exact Test for Count Data
# 
# data:  matrix(x, 2, 2)
# p-value = 0.507
# alternative hypothesis: true odds ratio is not equal to 1
# 95 percent confidence interval:
#  0.1341153       Inf
# sample estimates:
# odds ratio 
#        Inf

CodePudding user response：

This answer uses the data as posted in dcarlson's answer.

First see how to run one test, for instance, for Sample1.

mydata <- structure(list(
  Sample1 = c(40, 2, 30, 0), 
  Sample2 = c(50, 5, 60, 0), 
  Sample3 = c(60, 0, 65, 5), 
  Sample4 = c(86, 0, 35, 3)), 
  class = "data.frame", 
  row.names = c("high-nonmutation", "high-mutation", 
                "low-nonmutation", "low-mutation"))

suppressPackageStartupMessages({
  library(dplyr)
  library(tidyr)
})

mydata %>%
  tibble::rownames_to_column("rownms") %>%
  separate(rownms, into = c("Rank", "Mutation")) %>%
  xtabs(Sample1 ~ Mutation   Rank, data = .) %>%
  fisher.test()
#> 
#>  Fisher's Exact Test for Count Data
#> 
#> data:  .
#> p-value = 0.507
#> alternative hypothesis: true odds ratio is not equal to 1
#> 95 percent confidence interval:
#>  0.1341153       Inf
#> sample estimates:
#> odds ratio 
#>        Inf

^{Created on 2022-04-23 by the reprex package (v2.0.1)}

Now run all test with broom package function tidy.

mydata <- structure(list(
  Sample1 = c(40, 2, 30, 0), 
  Sample2 = c(50, 5, 60, 0), 
  Sample3 = c(60, 0, 65, 5), 
  Sample4 = c(86, 0, 35, 3)), 
  class = "data.frame", 
  row.names = c("high-nonmutation", "high-mutation", 
                "low-nonmutation", "low-mutation"))

suppressPackageStartupMessages({
  library(dplyr)
  library(tidyr)
  library(broom)
})

mydata %>%
  tibble::rownames_to_column("rownms") %>%
  separate(rownms, into = c("Rank", "Mutation")) %>% 
  pivot_longer(cols = starts_with("Sample"), names_to = "Sample") %>%
  group_by(Sample) %>%
  do(tidy(fisher.test( xtabs(value ~ Mutation   Rank, data = .data) ), data = .x))
#> # A tibble: 4 x 7
#> # Groups:   Sample [4]
#>   Sample  estimate p.value conf.low conf.high method                 alternative
#>   <chr>      <dbl>   <dbl>    <dbl>     <dbl> <chr>                  <chr>      
#> 1 Sample1      Inf  0.507     0.134    Inf    Fisher's Exact Test f~ two.sided  
#> 2 Sample2      Inf  0.0227    1.04     Inf    Fisher's Exact Test f~ two.sided  
#> 3 Sample3        0  0.0614    0          1.24 Fisher's Exact Test f~ two.sided  
#> 4 Sample4        0  0.0272    0          1.04 Fisher's Exact Test f~ two.sided

^{Created on 2022-04-23 by the reprex package (v2.0.1)}