I have a dataframe like this:
df1 <- data.frame(
id_val = c('row1', 'row2', 'row3'),
value_florida = c(10, 27, 16),
value_illinois = c(17, 14, 22),
value_vermont = c(11, 4, 29),
base_florida = c(12, 44, 32),
base_illinois = c(18, 29, 23),
base_vermont = c(15, 8, 33)
)
I want to run prop.test
on every combination of value and base columns such that I will know the significance of each row's pair against one another. Manually I would do this for the first pair, Row 1 Florida versus Row 1 Illinois:
tmp <- prop.test(x = c(10, 17), n = c(12, 18))
tmp$p.value
But then I'd like to move to the next pair, Row 1 Florida versus Row 1 Vermont. And then Row 1 Illinois versus Row 1 Vermont. Then on to the Row 2 pairs. And on and on.
Is there a way to achieve this in R, ideally using the purrr
library?
CodePudding user response:
This could be done after reshaping to 'long' format with pivot_longer
and then use combn
with m = 2
for pairwise testing after grouping by 'id_val'
library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
pivot_longer(cols = -id_val, names_to = c(".value", "state"),
names_sep = "_") %>%
group_by(id_val) %>%
summarise(state_pair = combn(state, 2, str_c, collapse = "_"),
pval = combn(state, 2, FUN = function(x) {
i1 <- state %in% x
prop.test(x = value[i1], n = base[i1])$p.value
}), .groups = 'drop')
-output
# A tibble: 9 × 3
id_val state_pair pval
<chr> <chr> <dbl>
1 row1 florida_illinois 0.709
2 row1 florida_vermont 0.877
3 row1 illinois_vermont 0.231
4 row2 florida_illinois 0.389
5 row2 florida_vermont 0.833
6 row2 illinois_vermont 1
7 row3 florida_illinois 0.000907
8 row3 florida_vermont 0.00237
9 row3 illinois_vermont 0.598
CodePudding user response:
Create a function f
that does the estimation, and then apply that function to each row of a long-formatted set:
Here is the simple function, which takes vectors of values (v
), base (b
) and states (s
)
f <- function(v,b,s) {
lapply(combn(1:length(s),2, simplify=F), function(k) {
list(contrast = paste0(s[k],collapse="_"), p_value = prop.test(x=v[k], n=b[k])$p.value)
})
}
Now just pivot longer, and apply the function by row.
pivot_longer(df1, -id_val, names_to=c(".value","state"), names_sep="_") %>%
group_by(id_val) %>%
summarize(k = f(value,base,state)) %>%
unnest_wider(k)
Output
id_val contrast p_value
<chr> <chr> <dbl>
1 row1 florida_illinois 0.709
2 row1 florida_vermont 0.877
3 row1 illinois_vermont 0.231
4 row2 florida_illinois 0.389
5 row2 florida_vermont 0.833
6 row2 illinois_vermont 1
7 row3 florida_illinois 0.000907
8 row3 florida_vermont 0.00237
9 row3 illinois_vermont 0.598