I'm trying to build a function that takes two sorts of inputs, either numeric or character, changes them or leaves them as they are given class, then filters a dataframe by those arguments.
library(tidyverse)
fun1 = function(df,filt_col,filt_term_1,filt_term_2){
# changing the filt_col to symbol which is need to correctly parse things
filt_col = sym(filt_col)
# if statement that checks whether the filtering term is numeric or not
# if it is numeric it leaves as is, whilst if not it deparse(substitutes) (i.e. makes into quoted text)
if (!is.numeric(filt_term_1)) {filt_term_1 = deparse(substitute(filt_term_1))}
if (!is.numeric(filt_term_2)) {filt_term_2 = deparse(substitute(filt_term_2))}
# doing one of two things depending on filtering terms that have been provided as arguments
# if numeric, then filter < and > than numbers provided
# if character, then filter == to argument provided
if(is.numeric(filt_term_1) & is.numeric(filt_term_2)) {
group1 = df %>% filter(!!filt_col < filt_term_1)
group2 = df %>% filter(!!filt_col > filt_term_2)
} else {
group1 = df %>% filter(!!filt_col == filt_term_1)
group2 = df %>% filter(!!filt_col == filt_term_2)
}
# put two groups in a list
grouped_list = list(group1,group2)
return(grouped_list)
}
# trying function which runs well with numeric args
fun1(iris,"Sepal.Length",4.9,4.9)
# but does not run with character args
fun1(iris,"Species",versicolor,virginica)
Firstly, I'm not sure what the error is about. Secondly, how can I make this more efficient? Ideally I would want to enter all arguments as non-quoted text.
Thank you.
CodePudding user response:
Normally character values are not passed using NSE, only column names.
Pass versicolor and virginica as "versicolor" and "virginica" and use S3 to handle the difference between numeric and character/factors. Note how much simpler it is now. (If for some reason you don't like S3 you could use an if statement but S3 will give more modular code.)
fun2 <- function(df, filt_col, filt_term_1, filt_term_2, ...) {
UseMethod("fun2", df[[filt_col]])
}
fun2.default <- function(df, filt_col, filt_term_1, filt_term_2, ...) {
group1 <- df %>% filter(.data[[filt_col]] < filt_term_1)
group2 <- df %>% filter(.data[[filt_col]] > filt_term_2)
list(group1, group2)
}
fun2.factor <-
fun2.character <- function(df, filt_col, filt_term_1, filt_term_2, ...) {
group1 <- df %>% filter(.data[[filt_col]] == filt_term_1)
group2 <- df %>% filter(.data[[filt_col]] == filt_term_2)
list(group1, group2)
}
fun2(iris,"Sepal.Length", 4.9, 4.9)
fun2(iris, "Species", "versicolor", "virginica")
Update
As pointed out in the comments I had missed that you want to use equality comparison for character and factor and inequality for numeric. Have fixed.
CodePudding user response:
The problem is the following three lines of conditions when parsing unquoted expressions to filt_term_1
and filt_term_2
:
if (!is.numeric(filt_term_1))
if (!is.numeric(filt_term_2))
if(is.numeric(filt_term_1) & is.numeric(filt_term_2))
If filt_term_*
is a numeric or character these expressions can be evaluated as they will be represented as atomic vectors. In the case of an object being passed, like the unquoted versicolor
it'll fail: This object does not exist and cannot evaluated outside a context.
A possible fix of your code:
We could think of various work arounds, but to avoid an XY problem, in your case, I'd propose to let the type of the variable in the dataset determine how the inputs should be treated. Not the type of input.
library(tidyverse)
fun1 = function(df, filt_col, filt_term_1, filt_term_2){
# changing the filt_col to symbol which is need to correctly parse things
filt_col = sym(filt_col)
# if statement that checks whether the filtering term is numeric or not
# if it is numeric it leaves as is, whilst if not it deparse(substitutes) (i.e. makes into quoted text)
if (!is.numeric(pull(df, {{filt_col}}))) {filt_term_1 = deparse(substitute(filt_term_1))}
if (!is.numeric(pull(df, {{filt_col}}))) {filt_term_2 = deparse(substitute(filt_term_2))}
# doing one of two things depending on filtering terms that have been provided as arguments
# if numeric, then filter < and > than numbers provided
# if character, then filter == to argument provided
if(is.numeric(pull(df, {{filt_col}}))) {
group1 = df %>% filter(!!filt_col < filt_term_1)
group2 = df %>% filter(!!filt_col > filt_term_2)
} else {
group1 = df %>% filter(!!filt_col == filt_term_1)
group2 = df %>% filter(!!filt_col == filt_term_2)
}
# put two groups in a list
grouped_list = list(group1,group2)
return(grouped_list)
}
A simpler solution in your spirit:
You might want to explore the {{ }}
syntax that I used above and simplify your code even more. The chunk below will take inputs like: fun1(iris, "Species", versicolor, virginica)
and fun1(iris, Species ,versicolor ,virginica)
. However, you'd want to think carefully of what inputs to accept and why.
library(tidyverse)
fun1 = function(df, filt_col, filt_term_1, filt_term_2){
if(is.numeric(pull(df, {{filt_col}}))) {
group1 = df %>% filter({{filt_col}} < filt_term_1)
group2 = df %>% filter({{filt_col}} > filt_term_2)
} else {
filt_term_1 <- deparse(substitute(filt_term_1))
filt_term_2 <- deparse(substitute(filt_term_2))
# We need the if_any (or similar hack) to accept both quoted and unquoted column names.
group1 = df %>% filter(if_any({{filt_col}}, ~ . == filt_term_1))
group2 = df %>% filter(if_any({{filt_col}}, ~ . == filt_term_2))
}
# put two groups in a list
grouped_list = list(group1,group2)
return(grouped_list)
}
A tidyverse-spirit solution:
However, as pointed out by @Limey, it would probably be more in line with the spirit of tidyverse
to take input columns as objects and values as character/numeric constants: (*)
fun1(iris, Species, "versicolor", "virginica")
fun1 <- function(df, filt_col, filt_term_1, filt_term_2) {
if (is.numeric(pull(df, {{filt_col}}))) {
group1 <- filter(df, {{filt_col}} < filt_term_1)
group2 <- filter(df, {{filt_col}} > filt_term_2)
} else {
group1 <- filter(df, {{filt_col}} == filt_term_1)
group2 <- filter(df, {{filt_col}} == filt_term_2)
}
list(group1, group2)
}
(*) Also pointed out by G. Grothendieck normally character values are not passed using NSE, only column names.