Home > Blockchain >  Perform multiple fisher.test for groups in data frame
Perform multiple fisher.test for groups in data frame

Time:11-15

I would like to perform multiple fisher.test() for each tax and each column (ABCB1 and ABL1 in the example below) in the data frame below. The contingency tables should be extracted from the rows like shown below. Note that the second column in the contingency table need to be calculated by extracting the Total column from the other column tested.

contingency example:

             ABCB1      NotABCB1(Total-ABCB1)
tax1Present      1     42
tax1NotPresent   3     30

data:

structure(list(group = c("tax1Present", "tax1NotPresent", "tax2Present", 
"tax2NotPresent", "tax3Present", "tax3NotPresent", "tax4Present", 
"tax4NotPresent", "tax5Present", "tax5NotPresent"), ABCB1 = c(1L, 
3L, 4L, 5L, 3L, 6L, 6L, 12L, 13L, 6L), ABL1 = c(24L, 24L, 12L, 
53L, 1L, 5L, 0L, 0L, 242L, 0L), Total = c(43L, 33L, 23L, 70L, 
9L, 15L, 7L, 19L, 300L, 10L), tax = c("tax1", "tax1", "tax2", 
"tax2", "tax3", "tax3", "tax4", "tax4", "tax5", "tax5")), row.names = c(NA, 
10L), class = "data.frame")


> df
            group ABCB1 ABL1 Total  tax
1     tax1Present     1   24    43 tax1
2  tax1NotPresent     3   24    33 tax1
3     tax2Present     4   12    23 tax2
4  tax2NotPresent     5   53    70 tax2
5     tax3Present     3    1     9 tax3
6  tax3NotPresent     6    5    15 tax3
7     tax4Present     6    0     7 tax4
8  tax4NotPresent    12    0    19 tax4
9     tax5Present    13  242   300 tax5
10 tax5NotPresent     6    0    10 tax5

CodePudding user response:

Try using apply:

# set the columns to use
columns <- c("ABCB1", "ABL1")

dat_test <- sapply( which(colnames( df ) %in% columns), 
  function(colx) lapply( unique( df$tax ), function(x) 
    fisher.test( data.frame(df[ df$tax %in% x,colx], 
      Total_diff=df[ df$tax %in% x, ]$Total - df[ df$tax %in% x, ][colx] )
  ) ) )

# set names
rownames(dat_test) <- unique( df$tax )
colnames(dat_test) <- columns

dat_test
     ABCB1  ABL1  
tax1 List,7 List,7
tax2 List,7 List,7
tax3 List,7 List,7
tax4 List,7 List,7
tax5 List,7 List,7

Access with e.g.:

dat_test[,"ABCB1"]
$tax1

    Fisher's Exact Test for Count Data

data:  data.frame(df[df$tax %in% x, colx], Total_diff = df[df$tax %in% x, ]$Total - df[df$tax %in% x, ][colx])
p-value = 0.3109
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.00443791 3.18701284
sample estimates:
odds ratio 
 0.2424665 

...etc
  • Related