Home > Mobile >  How to parse pairwiseCI test over multiple rows in a data frame of results
How to parse pairwiseCI test over multiple rows in a data frame of results

Time:10-29

I'm trying to calculate confidence intervals for the difference in some results using pairwiseCI.

The dataframe looks like this:

Category Male_Success Female_Success Male_UnSuccessful Female_UnSuccessful
A 100 150 90 60
B 70 40 30 80
C 20 30 50 50

To calculate the confidence interval for the difference in proportion successful for Category A I would apply the following code:

library(pairwiseCI)

success <- c(100, 150)
failure <- c(90, 60)
page <- c(2,1)
dataframe <- data.frame(cbind(success,failure,page))
pairwiseCI(cbind(success,failure)~page, data=dataframe, method="Prop.diff", CImethod="CC")

which gives the following output:

95 %-confidence intervals 
Method:  Continuity corrected interval for the difference of proportions 
  
estimate   lower   upper
2-1   -0.188 -0.2867 -0.0893

I would like to produce this for all 3 categories without typing them individually (I've used the 'apply' function before for chi-sq tests over a dataframe but cannot figure out how to use it in this setting). Ideally, I would like the estimate, lower and upper results printed in columns next to the original dataframe so it looks like this:

Category Male_Success Female_Success Male_UnSuccessful Female_UnSuccessful estimate lower upper

Thank you very much for your help in advance!

CodePudding user response:

You can create a helper function and apply the function to each row. In my example, I use the stats::prop.test() function instead of using a speciality package (pairwiseCI)

  1. Helper function that takes the four values of success/failure and returns a list of the estimate, and the confidence interval
f <- function(s1,s2,f1,f2) {
  k <- prop.test(matrix(c(s1,s2,f1,f2),nrow=2,ncol=2))
  setNames(as.list(c(-1*diff(k$estimate),k$conf.int)),c("estimate", "lower","upper"))
}
  1. Apply the function to each row
library(data.table)
setDT(df)[, (c("estimate", "lower", "upper")):= f(Male_Success, Female_Success, Male_UnSuccessful, Female_UnSuccessful), Category]

Note: above I use data.table, but you could also use dplyr and tidyr, like this:

library(dplyr)
library(tidyr)

df %>% 
  group_by(Category) %>%
  mutate(r = list(f(Male_Success,Female_Success, Male_UnSuccessful, Female_UnSuccessful))) %>% 
  ungroup() %>% 
  unnest_wider(r)

Output:

   Category Male_Success Female_Success Male_UnSuccessful Female_UnSuccessful    estimate      lower
     <char>        <int>          <int>             <int>               <int>       <num>      <num>
1:        A          100            150                90                  60 -0.18796992 -0.2866507
2:        B           70             40                30                  80  0.36666667  0.2342893
3:        C           20             30                50                  50 -0.08928571 -0.2525247
         upper
         <num>
1: -0.08928912
2:  0.49904403
3:  0.07395327

Input:

df = structure(list(Category = c("A", "B", "C"), Male_Success = c(100L, 
70L, 20L), Female_Success = c(150L, 40L, 30L), Male_UnSuccessful = c(90L, 
30L, 50L), Female_UnSuccessful = c(60L, 80L, 50L)), row.names = c(NA, 
-3L), class = "data.frame")
  • Related