Home > Software design >  Creating a loop for filter() and group_by() from dplyr
Creating a loop for filter() and group_by() from dplyr

Time:10-28

In my toy data below, I'm repeating group_by() and filter() for variables: sample, group, and outcome (but not time).

I wonder if there is a functional solution such that we can provide the names of any number of variables that we want to group_by() and filter() in a loop-wise fashion inside a function like foo() shown below?

library(tidyverse)

data <- expand_grid(study=1:3,sample=1:2,group=1:3,outcome=c("A","B"),time=0:2)

get_rows <- function(x) {  # Helper function used in `filter()`
  u <- unique(x) 
  n <- sample(c(if(is.character(x)) 0 else min(u)-1, u), 1)
  if(n == n[1]) TRUE else x == n
}


DF <- data %>%
  group_by(study) %>%
  filter(get_rows(sample)) %>% # for sample
  ungroup()

DF2 <- DF %>%
  group_by(study) %>%
  filter(get_rows(group)) %>% # for group
  ungroup()

DF3 <- DF2 %>%
  group_by(study) %>%
  filter(get_rows(outcome)) %>% # for outcome
  ungroup()
#============================================ HOW TO LOOP ABOVE IN `foo()` BELOW?
foo <- function(data, ..., exclude_vars = c("time")){
  
  ## SOLUTION
}

CodePudding user response:

You can loop over names of variables in strings if you use the dplyr .data pronoun. For example

foo <- function(data, exclude_vars = c("time", "study")){
  vars <- setdiff(names(data), exclude_vars)
  for (var in vars) {
    data <- data %>% 
      group_by(study) %>% 
      filter(get_rows(.data[[var]])) %>%
      ungroup()
  }
  data
}
foo(data)

If you prefer, you could use purrr::reduce rather than the loop

foo <- function(data, exclude_vars = c("time", "study")){
  vars <- setdiff(names(data), exclude_vars)
  cleanFn <- function(data, var) data %>% 
    group_by(study) %>% 
    filter(get_rows(.data[[var]])) %>% 
    ungroup()
  reduce(vars, cleanFn, .init=data)
}
foo(data)
  • Related