Home > Software design >  How to put splitting by multiple columns inside a function?
How to put splitting by multiple columns inside a function?

Time:05-28

Let's assume I want to make a function that splits a data frame by some columns and then uses lapply(select, ...) on the output list. I know I can select before splitting, in reality I want to use some custom function, select is just for the sake of the example. I figured out how to use a single grouping variable:

f <- function(df, group, ...) {
  split(df, getElement(df, substitute(group))) |>
    lapply(dplyr::select, ...)
}

I can invoke f(iris, Species, Sepal.Length, Sepal.Width) to get a list of sepal lengths and widths by species. Yet I don't know how to make splitting by multiple columns possible. Outside a function I'd use eg. split(dataframe, list(dataframe$group1, dataframe$group2)) but I can't find a way to put it into a function. I tried to use with but with no success. When I try to put a list in the function's argument I end up with the following error:

#> Error in 'unique.default(x, nmax = nmax)': unique() applies only to vectors

The question is – how do I make a function out of this:

split(dataframe, list(dataframe$group1, dataframe$group2) |>
  lapply(select, col1, col2, col3)

CodePudding user response:

Here's one approach to this. There's probably a way to do it with group as list, but I found a way to do it with group as a character vector, but does require quoting the column names.

This selects the two columns as passes those to split to use as the basis for splitting the dataframe.

library(tidyverse)

f <- function(df, group, ...) {
  split(df, mtcars[group]) %>%
    lapply(dplyr::select, ...)
}

f(mtcars, group = c("cyl", "carb"), wt, vs)

CodePudding user response:

Thanks to a hint from @Roger-123 I managed to create a function that doesn't need quoting the columns' names. It doesn't look elegant but it works.

f <- function(df, group, ...) {
  group <- vapply(substitute(group), deparse, "vector") # turns vector of colnames into character vector
  if (length(group) > 1) group <- group[2:length(group)] # deletes "c" if more than one colname was provided
  split(df, df[group]) |>
    lapply(dplyr::select, ...)
}
  •  Tags:  
  • r
  • Related