Home > OS >  How best to iterate prop.test across a row, ideally using purrr?
How best to iterate prop.test across a row, ideally using purrr?

Time:07-14

I have a dataframe like this:

df1 <- data.frame(
  id_val = c('row1', 'row2', 'row3'),
  value_florida = c(10, 27, 16),
  value_illinois = c(17, 14, 22),
  value_vermont = c(11, 4, 29),
  base_florida = c(12, 44, 32),
  base_illinois = c(18, 29, 23),
  base_vermont = c(15, 8, 33)
)

I want to run prop.test on every combination of value and base columns such that I will know the significance of each row's pair against one another. Manually I would do this for the first pair, Row 1 Florida versus Row 1 Illinois:

tmp <- prop.test(x = c(10, 17), n = c(12, 18))
tmp$p.value

But then I'd like to move to the next pair, Row 1 Florida versus Row 1 Vermont. And then Row 1 Illinois versus Row 1 Vermont. Then on to the Row 2 pairs. And on and on.

Is there a way to achieve this in R, ideally using the purrr library?

CodePudding user response:

This could be done after reshaping to 'long' format with pivot_longer and then use combn with m = 2 for pairwise testing after grouping by 'id_val'

library(dplyr)
library(tidyr)
library(stringr)
df1 %>% 
 pivot_longer(cols = -id_val, names_to = c(".value", "state"), 
    names_sep = "_") %>% 
 group_by(id_val) %>%
  summarise(state_pair = combn(state, 2, str_c, collapse = "_"),
   pval = combn(state, 2, FUN = function(x) {
     i1 <- state %in% x
    prop.test(x = value[i1], n = base[i1])$p.value
  }), .groups = 'drop')

-output

# A tibble: 9 × 3
  id_val state_pair           pval
  <chr>  <chr>               <dbl>
1 row1   florida_illinois 0.709   
2 row1   florida_vermont  0.877   
3 row1   illinois_vermont 0.231   
4 row2   florida_illinois 0.389   
5 row2   florida_vermont  0.833   
6 row2   illinois_vermont 1       
7 row3   florida_illinois 0.000907
8 row3   florida_vermont  0.00237 
9 row3   illinois_vermont 0.598   

CodePudding user response:

Create a function f that does the estimation, and then apply that function to each row of a long-formatted set:

Here is the simple function, which takes vectors of values (v), base (b) and states (s)

f <- function(v,b,s) {
  lapply(combn(1:length(s),2, simplify=F), function(k) {
    list(contrast =  paste0(s[k],collapse="_"), p_value = prop.test(x=v[k], n=b[k])$p.value)
  })
}

Now just pivot longer, and apply the function by row.

pivot_longer(df1, -id_val, names_to=c(".value","state"), names_sep="_") %>% 
  group_by(id_val) %>% 
  summarize(k = f(value,base,state)) %>% 
  unnest_wider(k)

Output

  id_val contrast          p_value
  <chr>  <chr>               <dbl>
1 row1   florida_illinois 0.709   
2 row1   florida_vermont  0.877   
3 row1   illinois_vermont 0.231   
4 row2   florida_illinois 0.389   
5 row2   florida_vermont  0.833   
6 row2   illinois_vermont 1       
7 row3   florida_illinois 0.000907
8 row3   florida_vermont  0.00237 
9 row3   illinois_vermont 0.598  
  • Related