Split a data frame by multiple factor columns in a function call-CodePudding

I'd like to write a function which splits a df by multiple factor variables (one at a time) and then subsequently runs another function on the resulting list. However, I cannot find a proper way to call the factor in base::split

Here's what I've tried so far

library (tidyverse)
fun_res  <- function (x,y) {
list_temp <- base::split (x, x$y, drop = FALSE) 

lapply (list_temp, another_fun) # does another function and returns results in a list
}

Then I'd like to run fun_res to split the df by various factor columns

fun_res_(df, factor_col1)
fun_res_(df, factor_col2)

However, x$y leads to the following error Error in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) : group length is 0 but data length > 0 What would be a proper way to do it?

Here's a short reprex:

library (tidyverse) 
data1 <- c(1,2,3,4,1,2,3,4)
data2 <- c(4,3,2,1,4,3,2,1)
factor1 <- c(rep(1,4), rep(2,4)) %>% as.factor ()
factor2 <- c(rep(1,5), rep(2,3)) %>% as.factor ()

df <- data.frame (data1, data2, factor1, factor2)

fun_res  <- function (x,y) {
  list_temp <- base::split (x, x$y, drop = FALSE) 
  
  lapply (list_temp, function (z){ # just a random function
    as.list(z) %>%
      return ()
  }) 
}

fun_res(df, factor1)
fun_res(df, factor2)

The reason why I'd like to call fun_res for each factor sequentially is that for my real data, the function in lapply returns a list of statistical test results that I want to print by referring to each list of results separately.

CodePudding user response：

In base R, if we are passing unquoted argument, use substitute and deparse it to character and then subset the column with [[

fun_res  <- function (x,y) {
    y <- deparse(substitute(y))
    list_temp <- base::split (x, x[[y]], drop = FALSE) 

 list_temp
  }

-testing

> fun_res(df, factor1)
$`1`
  data1 data2 factor1 factor2
1     1     4       1       1
2     2     3       1       1
3     3     2       1       1
4     4     1       1       1

$`2`
  data1 data2 factor1 factor2
5     1     4       2       1
6     2     3       2       2
7     3     2       2       2
8     4     1       2       2

> fun_res(df, factor2)
$`1`
  data1 data2 factor1 factor2
1     1     4       1       1
2     2     3       1       1
3     3     2       1       1
4     4     1       1       1
5     1     4       2       1

$`2`
  data1 data2 factor1 factor2
6     2     3       2       2
7     3     2       2       2
8     4     1       2       2