Home > OS >  Performing fisher exact test on multiple rows of data frame in R
Performing fisher exact test on multiple rows of data frame in R

Time:03-25

I have data frame like this

df.t<-data.frame(ID=c(1,1,2,2,3,3,4,4,5,5,6,6),type=rep(c("A","B"),12),Count.1= rep(c(20,80),12),Count.2=rep(c(70,30),12))

and I want to perform fisher test on each row. for example: I want to test if the count of ID 1 that has 2 types A and B different in count.1 and count.2? I tried different codes at the website but they did not work. Is there any easy function that can do it all the rows grouped by position and type please? Any help is appreciated, thanks!

CodePudding user response:

For easier calculation, I would first transform your data structure from a "long" format to "wide". Then set rownames to the dataframe, and use apply() for the fisher.test(). The output is a named list with names come from rownames of your dataframe (ID in this case).

First two entries of output is pasted here for demonstration.

library(tidyverse)

df.t <- data.frame(ID=c(1,1,2,2,3,3,4,4,5,5,6,6),
                 type=rep(c("A","B"),12),
                 Count.1= rep(c(20,80),12),
                 Count.2=rep(c(70,30),12)) %>% 
  distinct() %>% 
  pivot_wider(everything(), names_from = type, values_from = c(Count.1, Count.2)) %>% 
  as.data.frame()

rownames(df.t) <- df.t[, 1]

df.t <- df.t[, -1]

apply(df.t, 1, function(x) fisher.test(matrix(as.numeric(x[1:4]), ncol=2, byrow=T)))

List output

$`1`

    Fisher's Exact Test for Count Data

data:  matrix(as.numeric(x[1:4]), ncol = 2, byrow = T)
p-value = 1.05e-12
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.05288657 0.21489688
sample estimates:
odds ratio 
 0.1086015 


$`2`

    Fisher's Exact Test for Count Data

data:  matrix(as.numeric(x[1:4]), ncol = 2, byrow = T)
p-value = 1.05e-12
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.05288657 0.21489688
sample estimates:
odds ratio 
 0.1086015 

p-value output

If you want to simplify your result to only output p-values, you can do:

apply(df.t, 1, function(x) fisher.test(matrix(as.numeric(x[1:4]), ncol=2, byrow=T))$p.value)

           1            2            3            4            5 
1.050355e-12 1.050355e-12 1.050355e-12 1.050355e-12 1.050355e-12 
           6 
1.050355e-12 
  •  Tags:  
  • r
  • Related