Home > Software engineering >  Fisher's exact test in R from dataframe
Fisher's exact test in R from dataframe

Time:03-22

I have input data (df) for making 2*2 contingency table for each row.

df <- data.frame(as = c("A", "B", "C", "D"), sum_m = c(47, 8, 93, 73), 
           length_m = c(150, 150, 150, 150), sum_w = c(66, 183, 44, 113), length_w = c(199, 199, 199, 199), 
           pooled_p = c(0.32378223495702, 0.547277936962751, 0.392550143266476, 0.532951289398281), 
           test1 = c(TRUE, TRUE, TRUE, TRUE), test2 = c(TRUE, TRUE, TRUE, TRUE), test3 = c(TRUE, TRUE, TRUE, TRUE), 
           test4 = c(TRUE, TRUE, TRUE, TRUE), final_test = c(TRUE, TRUE, TRUE, TRUE))

I wrote a small script (given below) for calculating p value for a single row:

# Chi-square or Fisher's exact test
x    <- c(sum_m, sum_w)
n    <- c(length_m, length_w)
mash <- rbind(c(sum_m, length_m - sum_m),
              c(sum_w, length_w - sum_w))


if(final_test == TRUE){
  
  ## With Yate's continuity correction
  
  prop.test(x,n)
  #Exactly the same as:
  chisq.test(mash)
  
}else{
  
  # Fisher's exact test
  fisher.test(mash)
  
}

hopefully, this makes sense to you.

Suggestions on how to apply this to a large number of rows would be greatly appreciated! If possible then please paste the p-value at the last column.

Thanks in advance :X)

CodePudding user response:

We could wrap the code into a function and then use rowwise and apply the function

library(dplyr)
library(tidyr)
df %>%
   rowwise %>% 
   mutate(out = list(f1(sum_m, sum_w, length_m, length_w, final_test) %>% 
        broom::tidy(.)))  %>%
   ungroup %>%
   unnest(out)

-output

# A tibble: 4 × 15
  as    sum_m length_m sum_w length_w pooled_p test1 test2 test3 test4 final_test statistic  p.value parameter method                          
  <chr> <dbl>    <dbl> <dbl>    <dbl>    <dbl> <lgl> <lgl> <lgl> <lgl> <lgl>          <dbl>    <dbl>     <int> <chr>                           
1 A        47      150    66      199    0.324 TRUE  TRUE  TRUE  TRUE  TRUE          0.0608 8.05e- 1         1 Pearson's Chi-squared test with…
2 B         8      150   183      199    0.547 TRUE  TRUE  TRUE  TRUE  TRUE        256.     1.59e-57         1 Pearson's Chi-squared test with…
3 C        93      150    44      199    0.393 TRUE  TRUE  TRUE  TRUE  TRUE         55.4    9.77e-14         1 Pearson's Chi-squared test with…
4 D        73      150   113      199    0.533 TRUE  TRUE  TRUE  TRUE  TRUE          1.95   1.63e- 1         1 Pearson's Chi-squared test with…

It may be faster with pmap instead of rowwise

library(purrr)
df %>% 
   mutate(out = pmap(across(c(sum_m, sum_w, length_m, length_w, final_test)), 
     ~ f1(..1, ..2, ..3, ..4, ..5) %>% 
            broom::tidy(.))) %>%
   unnest(out)

-output

# A tibble: 4 × 15
  as    sum_m length_m sum_w length_w pooled_p test1 test2 test3 test4 final_test statistic  p.value parameter method                          
  <chr> <dbl>    <dbl> <dbl>    <dbl>    <dbl> <lgl> <lgl> <lgl> <lgl> <lgl>          <dbl>    <dbl>     <int> <chr>                           
1 A        47      150    66      199    0.324 TRUE  TRUE  TRUE  TRUE  TRUE          0.0608 8.05e- 1         1 Pearson's Chi-squared test with…
2 B         8      150   183      199    0.547 TRUE  TRUE  TRUE  TRUE  TRUE        256.     1.59e-57         1 Pearson's Chi-squared test with…
3 C        93      150    44      199    0.393 TRUE  TRUE  TRUE  TRUE  TRUE         55.4    9.77e-14         1 Pearson's Chi-squared test with…
4 D        73      150   113      199    0.533 TRUE  TRUE  TRUE  TRUE  TRUE          1.95   1.63e- 1         1 Pearson's Chi-squared test with…

-function

f1 <- function(sum_m, sum_w, length_m, length_w, final_test) {
 
 x    <- c(sum_m, sum_w)
 n    <- c(length_m, length_w)
 mash <- rbind(c(sum_m, length_m - sum_m),
               c(sum_w, length_w - sum_w))


 if(final_test == TRUE){
  
   ## With Yate's continuity correction
  
   prop.test(x,n)
   #Exactly the same as:
   chisq.test(mash)
  
 }else{
  
   # Fisher's exact test
   fisher.test(mash)
  
 }
 
 
 
 
 }

CodePudding user response:

I recommend you to work with functions when do you want to apply a function rowwise or columnwise.

If you don't want to work with dyplr, R base solution:

test = function(x1,x2,y1,y2,test){
  
  mash = rbind(c(x1, x2 - x1),
               c(y1, y2 - y1))
  
  if(test){
    res = chisq.test(mash)
  }
  else{
    res = fisher.test(mash)
  }
  
  return(res$p.value)
}

mapply(FUN = test,
       df$sum_m, df$length_m, df$sum_w, df$length_w, df$final_test)

Result:

mapply(FUN = test,df$sum_m,df$length_m,df$sum_w,df$length_w,df$final_test)
[1] 8.051833e-01 1.590633e-57 9.772551e-14 1.626199e-01
  • Related