I have input data (df) for making 2*2 contingency table for each row.
df <- data.frame(as = c("A", "B", "C", "D"), sum_m = c(47, 8, 93, 73),
length_m = c(150, 150, 150, 150), sum_w = c(66, 183, 44, 113), length_w = c(199, 199, 199, 199),
pooled_p = c(0.32378223495702, 0.547277936962751, 0.392550143266476, 0.532951289398281),
test1 = c(TRUE, TRUE, TRUE, TRUE), test2 = c(TRUE, TRUE, TRUE, TRUE), test3 = c(TRUE, TRUE, TRUE, TRUE),
test4 = c(TRUE, TRUE, TRUE, TRUE), final_test = c(TRUE, TRUE, TRUE, TRUE))
I wrote a small script (given below) for calculating p value for a single row:
# Chi-square or Fisher's exact test
x <- c(sum_m, sum_w)
n <- c(length_m, length_w)
mash <- rbind(c(sum_m, length_m - sum_m),
c(sum_w, length_w - sum_w))
if(final_test == TRUE){
## With Yate's continuity correction
prop.test(x,n)
#Exactly the same as:
chisq.test(mash)
}else{
# Fisher's exact test
fisher.test(mash)
}
hopefully, this makes sense to you.
Suggestions on how to apply this to a large number of rows would be greatly appreciated! If possible then please paste the p-value at the last column.
Thanks in advance :X)
CodePudding user response:
We could wrap the code into a function and then use rowwise
and apply the function
library(dplyr)
library(tidyr)
df %>%
rowwise %>%
mutate(out = list(f1(sum_m, sum_w, length_m, length_w, final_test) %>%
broom::tidy(.))) %>%
ungroup %>%
unnest(out)
-output
# A tibble: 4 × 15
as sum_m length_m sum_w length_w pooled_p test1 test2 test3 test4 final_test statistic p.value parameter method
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <lgl> <lgl> <lgl> <lgl> <dbl> <dbl> <int> <chr>
1 A 47 150 66 199 0.324 TRUE TRUE TRUE TRUE TRUE 0.0608 8.05e- 1 1 Pearson's Chi-squared test with…
2 B 8 150 183 199 0.547 TRUE TRUE TRUE TRUE TRUE 256. 1.59e-57 1 Pearson's Chi-squared test with…
3 C 93 150 44 199 0.393 TRUE TRUE TRUE TRUE TRUE 55.4 9.77e-14 1 Pearson's Chi-squared test with…
4 D 73 150 113 199 0.533 TRUE TRUE TRUE TRUE TRUE 1.95 1.63e- 1 1 Pearson's Chi-squared test with…
It may be faster with pmap
instead of rowwise
library(purrr)
df %>%
mutate(out = pmap(across(c(sum_m, sum_w, length_m, length_w, final_test)),
~ f1(..1, ..2, ..3, ..4, ..5) %>%
broom::tidy(.))) %>%
unnest(out)
-output
# A tibble: 4 × 15
as sum_m length_m sum_w length_w pooled_p test1 test2 test3 test4 final_test statistic p.value parameter method
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <lgl> <lgl> <lgl> <lgl> <dbl> <dbl> <int> <chr>
1 A 47 150 66 199 0.324 TRUE TRUE TRUE TRUE TRUE 0.0608 8.05e- 1 1 Pearson's Chi-squared test with…
2 B 8 150 183 199 0.547 TRUE TRUE TRUE TRUE TRUE 256. 1.59e-57 1 Pearson's Chi-squared test with…
3 C 93 150 44 199 0.393 TRUE TRUE TRUE TRUE TRUE 55.4 9.77e-14 1 Pearson's Chi-squared test with…
4 D 73 150 113 199 0.533 TRUE TRUE TRUE TRUE TRUE 1.95 1.63e- 1 1 Pearson's Chi-squared test with…
-function
f1 <- function(sum_m, sum_w, length_m, length_w, final_test) {
x <- c(sum_m, sum_w)
n <- c(length_m, length_w)
mash <- rbind(c(sum_m, length_m - sum_m),
c(sum_w, length_w - sum_w))
if(final_test == TRUE){
## With Yate's continuity correction
prop.test(x,n)
#Exactly the same as:
chisq.test(mash)
}else{
# Fisher's exact test
fisher.test(mash)
}
}
CodePudding user response:
I recommend you to work with functions when do you want to apply a function rowwise or columnwise.
If you don't want to work with dyplr, R base solution:
test = function(x1,x2,y1,y2,test){
mash = rbind(c(x1, x2 - x1),
c(y1, y2 - y1))
if(test){
res = chisq.test(mash)
}
else{
res = fisher.test(mash)
}
return(res$p.value)
}
mapply(FUN = test,
df$sum_m, df$length_m, df$sum_w, df$length_w, df$final_test)
Result:
mapply(FUN = test,df$sum_m,df$length_m,df$sum_w,df$length_w,df$final_test)
[1] 8.051833e-01 1.590633e-57 9.772551e-14 1.626199e-01