I want to create a re-usable function for a repeating t-test such that the column names can be passed into a formula. However, I cannot find a way to make it work. So the following code is the idea:
library(dplyr)
library(rstatix)
do.function <- function(table, column, category) {
column = sym(column)
category = sym(category)
stat.test <- table %>%
group_by(subset) %>%
t_test(column ~ category)
return(stat.test)
}
tmp = data.frame(id=seq(1:100), value = rnorm(100), subset = rep(c("Set1", "Set2"),each=50,2),categorical_value= rep(c("A", "B"),each=25,4))
do.function(table= tmp, column = "value", category = "categorical_value")
The current error that I get is the following:
Error: Can't extract columns that don't exist.
x Column `category` doesn't exist.
Run `rlang::last_error()` to see where the error occurred.
The question is whether somebody knows how to solve this?
CodePudding user response:
Just make a formula instead of wrapping them in sym
:
library(dplyr)
library(rstatix)
do.function <- function(table, column, category) {
formula <- paste0(column, '~', category) %>%
as.formula()
table %>%
group_by(subset) %>%
t_test(formula)
}
tmp = data.frame(id=seq(1:100), value = rnorm(100), subset = rep(c("Set1", "Set2"),each=50,2),categorical_value= rep(c("A", "B"),each=25,4))
do.function(table= tmp, column = "value", category = "categorical_value")
# A tibble: 2 x 9
subset .y. group1 group2 n1 n2 statistic df p
* <chr> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl>
1 Set1 value A B 50 50 0.484 94.3 0.63
2 Set2 value A B 50 50 -2.15 97.1 0.034
CodePudding user response:
As we are passing string values, we may just use reformulate
to create the expression in formula
do.function <- function(table, column, category) {
stat.test <- table %>%
group_by(subset) %>%
t_test(reformulate(category, response = column ))
return(stat.test)
}
-testing
> do.function(table= tmp, column = "value", category = "categorical_value")
# A tibble: 2 × 9
subset .y. group1 group2 n1 n2 statistic df p
* <chr> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl>
1 Set1 value A B 50 50 1.66 97.5 0.0993
2 Set2 value A B 50 50 0.448 92.0 0.655
CodePudding user response:
Formula actually is already used in rstatix::t_test
, and we net to get
the variables by their names.
do.function <- function(table, column, category) {
stat.test <- table %>%
mutate(column=get(column),
category=get(category)) %>%
rstatix::t_test(column ~ category)
return(stat.test)
}
do.function(table=tmp, column="value", category="categorical_value")
# # A tibble: 1 × 8
# .y. group1 group2 n1 n2 statistic df p
# * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl>
# 1 column A B 100 100 0.996 197. 0.32