I am trying to do boxplots and add p-values from rstatix::wilcox_test()
. But I want to do this in a loop. so the column names for the formula for wilcox_test
is stored in a variable. When I use the variable name in the wilcox_test, it throws error
Error in `pull()`:
! Can't extract columns that don't exist.
✖ Column `rs_id` doesn't exist.
Can someone please help me how to use variables in the formula for wilcox_test()
here is my code
rs_ids <- c('rs2160669', 'rs964184')
phenos <- c('HBA1C', 'TG')
for (rs_id in rs_ids){
for(phen in phenos){
bxp <- ggboxplot(d.k, x = rs_id, y = phen)
stat.test <- d.k %>% wilcox_test(.data[[`phen`]] ~ .data[[`rs_id`]]) %>% add_significance()
stat.test <- stat.test %>% add_xy_position(x = .data[[rs_id]]) %>% mutate(myformatted.p = ifelse(p < 0.001, 'p<0.001', paste0('p=',signif(p, digits = 2))))
bxp <- bxp geom_jitter(position=position_jitter(0.2), alpha = 0.5) theme_bw() theme(legend.position="none", axis.text=element_text(size=12), axis.title=element_text(size=14,face="bold")) xlab(rs_id) ylab(phen)
plot <- bxp stat_pvalue_manual(stat.test, label = "{myformatted.p}", tip.length = 0.01, step.increase = 0.1)
}
}
sample data is below
structure(list(HBA1C = c(4.5, 5.9, 8.02, 5.5, 5.4, 6, 7.1, 9.9,
5.58, 5, 7.6, 8, 5.5, 6.4, 6.8, 9, 6.2, 4.9, 5.7, 6.2, 6.3, 5.7,
5.9, 6.9, 7.3, 5.7, 7.2, 10.4, 5.4, 5.3, 4.8, 5.5, 7.1, 6.2,
7, 7.4, 11.4, 5.4, 5.5, 5.3, 5.5, 5.9, 5.5, 5.79, 4.8, 6.16,
5.63, 7.41, 7.68, 5.82), TG = c(0.83, 0.95, 2.48, 0.78, 0.48,
1.16, 1.45, 1.32, 1.03, 3.77, 3.87, 2.34, 1.98, 2.59, 2.92, 2.71,
6.88, 1.25, 3.68, 1.15, 2.33, 3.37, 0.61, 1.02, 1.63, 1.32, 1.21,
1.35, 0.85, 3.97, 4.04, 3.63, 2.62, 1.46, 2.33, 2.46, 1.09, 1.46,
2.77, 3.1, 3.13, 2.55, 1.91, 0.97, 0.87, 1.46, 1.45, 1.15, 2.61,
2.15), rs2160669 = c(3, 2, 3, 3, 3, 3, 3, 3, 3, 2, 3, 2, 3, 2,
1, 2, 3, 3, 3, 3, 2, 2, 3, 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 3, 3,
2, 3, 3, 3, 2, 3, NA, 3, 3, 3, 3, 3, 3, 2, 2), rs964184 = c(1,
2, 1, 2, 1, 2, 1, 1, 1, 2, 2, 2, 1, 2, 3, 2, 1, 1, 1, 1, 2, 2,
1, 1, 1, 2, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, NA,
2, 1, 3, 1, 1, 2, 2, 2)), row.names = c(NA, -50L), class = c("tbl_df",
"tbl", "data.frame"))
CodePudding user response:
A few issues here:
Backticks are wrong, even in dplyr verbs that recognize
.data
you'd need to use.data[[phen]]
and.data[[rs_id]]
. (C.f.,d.k %>% count(.data[[phen]])
)..data
can only be used in tidyverse-aware functions (e.g., dplyr), it won't work inwilcox_test
.
I suggest you make the formula as a string and then as.formula
it. This will work in the %>%
-pipe since the first argument to the test is data=
, so the formula (as we form it) will be the second arg, formula=
.
rs_id <- "rs2160669"
phen <- "HBA1C"
as.formula(paste(phen, rs_id, sep = "~"))
# HBA1C ~ rs2160669
library(dplyr) # just for %>% which is magrittr, but I'm inferring your use of dplyr
library(rstatix)
d.k %>%
wilcox_test(as.formula(paste(phen, rs_id, sep = "~")))
# # A tibble: 3 x 9
# .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
# * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
# 1 HBA1C 1 2 1 12 7 0.923 1 ns
# 2 HBA1C 1 3 1 36 25 0.542 1 ns
# 3 HBA1C 2 3 12 36 280 0.13 0.39 ns