I'm trying to calculate confidence intervals for the difference in some results using pairwiseCI.
The dataframe looks like this:
Category | Male_Success | Female_Success | Male_UnSuccessful | Female_UnSuccessful |
---|---|---|---|---|
A | 100 | 150 | 90 | 60 |
B | 70 | 40 | 30 | 80 |
C | 20 | 30 | 50 | 50 |
To calculate the confidence interval for the difference in proportion successful for Category A I would apply the following code:
library(pairwiseCI)
success <- c(100, 150)
failure <- c(90, 60)
page <- c(2,1)
dataframe <- data.frame(cbind(success,failure,page))
pairwiseCI(cbind(success,failure)~page, data=dataframe, method="Prop.diff", CImethod="CC")
which gives the following output:
95 %-confidence intervals
Method: Continuity corrected interval for the difference of proportions
estimate lower upper
2-1 -0.188 -0.2867 -0.0893
I would like to produce this for all 3 categories without typing them individually (I've used the 'apply' function before for chi-sq tests over a dataframe but cannot figure out how to use it in this setting). Ideally, I would like the estimate, lower and upper results printed in columns next to the original dataframe so it looks like this:
Category | Male_Success | Female_Success | Male_UnSuccessful | Female_UnSuccessful | estimate | lower | upper |
---|
Thank you very much for your help in advance!
CodePudding user response:
You can create a helper function and apply the function to each row. In my example, I use the stats::prop.test()
function instead of using a speciality package (pairwiseCI
)
- Helper function that takes the four values of success/failure and returns a list of the estimate, and the confidence interval
f <- function(s1,s2,f1,f2) {
k <- prop.test(matrix(c(s1,s2,f1,f2),nrow=2,ncol=2))
setNames(as.list(c(-1*diff(k$estimate),k$conf.int)),c("estimate", "lower","upper"))
}
- Apply the function to each row
library(data.table)
setDT(df)[, (c("estimate", "lower", "upper")):= f(Male_Success, Female_Success, Male_UnSuccessful, Female_UnSuccessful), Category]
Note: above I use data.table
, but you could also use dplyr
and tidyr
, like this:
library(dplyr)
library(tidyr)
df %>%
group_by(Category) %>%
mutate(r = list(f(Male_Success,Female_Success, Male_UnSuccessful, Female_UnSuccessful))) %>%
ungroup() %>%
unnest_wider(r)
Output:
Category Male_Success Female_Success Male_UnSuccessful Female_UnSuccessful estimate lower
<char> <int> <int> <int> <int> <num> <num>
1: A 100 150 90 60 -0.18796992 -0.2866507
2: B 70 40 30 80 0.36666667 0.2342893
3: C 20 30 50 50 -0.08928571 -0.2525247
upper
<num>
1: -0.08928912
2: 0.49904403
3: 0.07395327
Input:
df = structure(list(Category = c("A", "B", "C"), Male_Success = c(100L,
70L, 20L), Female_Success = c(150L, 40L, 30L), Male_UnSuccessful = c(90L,
30L, 50L), Female_UnSuccessful = c(60L, 80L, 50L)), row.names = c(NA,
-3L), class = "data.frame")