In my toy data
below, I'm repeating group_by()
and filter()
for variables: sample
, group
, and outcome
(but not time
).
I wonder if there is a functional solution such that we can provide the names of any number of variables that we want to group_by()
and filter()
in a loop-wise fashion inside a function like foo()
shown below?
library(tidyverse)
data <- expand_grid(study=1:3,sample=1:2,group=1:3,outcome=c("A","B"),time=0:2)
get_rows <- function(x) { # Helper function used in `filter()`
u <- unique(x)
n <- sample(c(if(is.character(x)) 0 else min(u)-1, u), 1)
if(n == n[1]) TRUE else x == n
}
DF <- data %>%
group_by(study) %>%
filter(get_rows(sample)) %>% # for sample
ungroup()
DF2 <- DF %>%
group_by(study) %>%
filter(get_rows(group)) %>% # for group
ungroup()
DF3 <- DF2 %>%
group_by(study) %>%
filter(get_rows(outcome)) %>% # for outcome
ungroup()
#============================================ HOW TO LOOP ABOVE IN `foo()` BELOW?
foo <- function(data, ..., exclude_vars = c("time")){
## SOLUTION
}
CodePudding user response:
You can loop over names of variables in strings if you use the dplyr .data
pronoun. For example
foo <- function(data, exclude_vars = c("time", "study")){
vars <- setdiff(names(data), exclude_vars)
for (var in vars) {
data <- data %>%
group_by(study) %>%
filter(get_rows(.data[[var]])) %>%
ungroup()
}
data
}
foo(data)
If you prefer, you could use purrr::reduce
rather than the loop
foo <- function(data, exclude_vars = c("time", "study")){
vars <- setdiff(names(data), exclude_vars)
cleanFn <- function(data, var) data %>%
group_by(study) %>%
filter(get_rows(.data[[var]])) %>%
ungroup()
reduce(vars, cleanFn, .init=data)
}
foo(data)