Let's assume I want to make a function that splits a data frame by some columns and then uses lapply(select, ...)
on the output list. I know I can select before splitting, in reality I want to use some custom function, select
is just for the sake of the example. I figured out how to use a single grouping variable:
f <- function(df, group, ...) {
split(df, getElement(df, substitute(group))) |>
lapply(dplyr::select, ...)
}
I can invoke f(iris, Species, Sepal.Length, Sepal.Width)
to get a list of sepal lengths and widths by species. Yet I don't know how to make splitting by multiple columns possible. Outside a function I'd use eg. split(dataframe, list(dataframe$group1, dataframe$group2))
but I can't find a way to put it into a function. I tried to use with
but with no success. When I try to put a list in the function's argument I end up with the following error:
#> Error in 'unique.default(x, nmax = nmax)': unique() applies only to vectors
The question is – how do I make a function out of this:
split(dataframe, list(dataframe$group1, dataframe$group2) |>
lapply(select, col1, col2, col3)
CodePudding user response:
Here's one approach to this. There's probably a way to do it with group
as list, but I found a way to do it with group
as a character vector, but does require quoting the column names.
This selects the two columns as passes those to split to use as the basis for splitting the dataframe.
library(tidyverse)
f <- function(df, group, ...) {
split(df, mtcars[group]) %>%
lapply(dplyr::select, ...)
}
f(mtcars, group = c("cyl", "carb"), wt, vs)
CodePudding user response:
Thanks to a hint from @Roger-123 I managed to create a function that doesn't need quoting the columns' names. It doesn't look elegant but it works.
f <- function(df, group, ...) {
group <- vapply(substitute(group), deparse, "vector") # turns vector of colnames into character vector
if (length(group) > 1) group <- group[2:length(group)] # deletes "c" if more than one colname was provided
split(df, df[group]) |>
lapply(dplyr::select, ...)
}