I would like to bootstrap the p-value and standard errors from weighted Mann-Whitney U test.
I can run the test as: weighted_mannwhitney(c12hour ~ c161sex weight, efc) which works fine, but am not entirely sure how I can run a bootstrapped version of the same to obtain a bootstrapped p-value for instance.
library(sjstats) # weighted Mann-Whitney
library(tidyverse) # main workflow, which has purrr and forcats (IIRC)
# library(broom) # for tidying model output, but not directly loaded
library(modelr) # for bootstrap
data(efc)
efc$weight <- abs(rnorm(nrow(efc), 1, .3))
# weighted Mann-Whitney-U-test ----
weighted_mannwhitney(c12hour ~ c161sex weight, efc)
# Bootstrapping
set.seed(1000) # for reproducibility
boot_efc <- efc %>% bootstrap(1000)
# Throws error!
boot_efc %>%
dplyr::mutate(c12hour = map(strap, ~weighted_mannwhitney(c12hour ~ c161sex weight, data = .)),
tidy = map(c12hour, broom::tidy)) -> boot_efc_out
SIDE NOTE: The package for the weighted Mann-Whitney test has its own bootstrap function which can be used as shown below to obtain bootstrapped standard error and bootstrapped p-value, but this is running a different function (mean), I could not adapt that for the weighted Mann-Whitney. Not sure if this helps
# or as tidyverse-approach
if (require("dplyr") && require("purrr")) {
bs <- efc %>%
bootstrap(100) %>%
mutate(
c12hour = map_dbl(strap, ~mean(as.data.frame(.x)$c12hour, na.rm = TRUE))
)
# bootstrapped standard error
boot_se(bs, c12hour)
# bootstrapped p-value
boot_p(bs, c12hour)
}
CodePudding user response:
This should do the trick:
library(sjstats) # weighted Mann-Whitney
library(tidyverse) # main workflow, which has purrr and forcats (IIRC)
# library(broom) # for tidying model output, but not directly loaded
library(modelr) # for bootstrap
#>
#> Attaching package: 'modelr'
#> The following objects are masked from 'package:sjstats':
#>
#> bootstrap, mse, rmse
data(efc)
efc$weight <- abs(rnorm(nrow(efc), 1, .3))
# weighted Mann-Whitney-U-test ----
mw_full <- weighted_mannwhitney(c12hour ~ c161sex weight, efc)
# Bootstrapping
set.seed(1000) # for reproducibility
boot_efc <- efc %>% bootstrap(1000)
tmp <- boot_efc %>%
dplyr::mutate(c12hour = map_dbl(strap,
~sjstats:::weighted_mannwhitney.formula(c12hour ~ c161sex weight,
data = .$data[.$idx, ])$estimate))
boot_p(tmp$c12hour)
# term p.value
# 1 x 0.0316659
Created on 2022-09-07 by the reprex package (v2.0.1)
Note that one of the problems comes from the way weighted_mannwhitney()
works inside the map()
function. When it's called, it calls the default
method rather than the formula
method and then generates an error. You can call the formula
method for the statistic as I did in the code. The other problem is that each strap
element of boot_efc
isn't a single data frame, it has two elements data
and idx
where the idx
element are the bootstrapped observation numbers. So, you need to use .$data[.$idx, ]
as the bootstrapped data. The rest follows along from the example you posted.