Home > Mobile >  Split a data frame by multiple factor columns in a function call
Split a data frame by multiple factor columns in a function call

Time:02-22

I'd like to write a function which splits a df by multiple factor variables (one at a time) and then subsequently runs another function on the resulting list. However, I cannot find a proper way to call the factor in base::split

Here's what I've tried so far

library (tidyverse)
fun_res  <- function (x,y) {
list_temp <- base::split (x, x$y, drop = FALSE) 

lapply (list_temp, another_fun) # does another function and returns results in a list
}

Then I'd like to run fun_res to split the df by various factor columns

fun_res_(df, factor_col1)
fun_res_(df, factor_col2)

However, x$y leads to the following error Error in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) : group length is 0 but data length > 0 What would be a proper way to do it?

Here's a short reprex:

library (tidyverse) 
data1 <- c(1,2,3,4,1,2,3,4)
data2 <- c(4,3,2,1,4,3,2,1)
factor1 <- c(rep(1,4), rep(2,4)) %>% as.factor ()
factor2 <- c(rep(1,5), rep(2,3)) %>% as.factor ()

df <- data.frame (data1, data2, factor1, factor2)

fun_res  <- function (x,y) {
  list_temp <- base::split (x, x$y, drop = FALSE) 
  
  lapply (list_temp, function (z){ # just a random function
    as.list(z) %>%
      return ()
  }) 
}

fun_res(df, factor1)
fun_res(df, factor2)

The reason why I'd like to call fun_res for each factor sequentially is that for my real data, the function in lapply returns a list of statistical test results that I want to print by referring to each list of results separately.

CodePudding user response:

In base R, if we are passing unquoted argument, use substitute and deparse it to character and then subset the column with [[

fun_res  <- function (x,y) {
    y <- deparse(substitute(y))
    list_temp <- base::split (x, x[[y]], drop = FALSE) 

 list_temp
  }

-testing

> fun_res(df, factor1)
$`1`
  data1 data2 factor1 factor2
1     1     4       1       1
2     2     3       1       1
3     3     2       1       1
4     4     1       1       1

$`2`
  data1 data2 factor1 factor2
5     1     4       2       1
6     2     3       2       2
7     3     2       2       2
8     4     1       2       2

> fun_res(df, factor2)
$`1`
  data1 data2 factor1 factor2
1     1     4       1       1
2     2     3       1       1
3     3     2       1       1
4     4     1       1       1
5     1     4       2       1

$`2`
  data1 data2 factor1 factor2
6     2     3       2       2
7     3     2       2       2
8     4     1       2       2
  • Related