Here is my dataset
mydata<-data.frame(
id=1:20,
sex=sample(c(rep("M",6),rep("F",14))),
Age=round(rnorm(20, 30,2)),
Weight=round(rnorm(20, 65,5),2)
)
I want my function to allow me to specify on which variable I want to do the filtering but also the criterion, i.e. the operator (== or > or <=...)
and the value (M or 65...)
This is the function I am trying to create. I know in advance that it won't work, it's to give an idea of what I want to create.
If I don't put the variable, value and operator of selection my function must return the original database otherwise the filtered database
my_func<-function(select_var, select_crit){
mydata<-mydata<-if(is.null(select_var)&is.null(select_crit)){mydata}else{
mydata[ which(mydata[select_var]select_crit), ]
}
return(mydata)
}
For example I want to be able to select all the male with my function like this
my_func(select_var="sex",select_crit="M"),
And all the induvidual > 30 (in age) like this:
my_func(select_var="Age",select_crit=">30")
or to select with the operator %in%
my_func(select_var="Age",select_crit=%in%c(30:40))
CodePudding user response:
You have to add a data
argument inside your function and apply a combination of eval
, parse
and paste0
for building your filter (row selection) criterion. This approach will help:
my_func <- function(data, select_var=NULL, select_crit=NULL){
if(is.null(select_var) & is.null(select_crit)){
output <- data
} else {
output <- data[eval(parse(text=paste0("data", "$",select_var, select_crit))), select_var, drop=FALSE]
}
return(output)
}
Examples:
> my_func(mydata, select_var="Age", select_crit=">30")
Age
1 32
5 32
7 33
8 31
9 33
13 31
16 33
18 32
19 32
> my_func(mydata, select_var="Age",select_crit="%in%c(30:40)")
Age
1 32
2 30
5 32
7 33
8 31
9 33
11 30
13 31
14 30
16 33
17 30
18 32
19 32
Calling my_func(data)
with select_var
and select_crit
with defult NULL
will return your original dataset.
CodePudding user response:
Three suggestions:
Make the data an argument of the function and not accessed via scope-breach. This helped with reproducibility, troubleshooting, maintenance, etc, and as a side-effect will allow your function to operate in
%>%
- and|>
-pipes (if so desired).Use
&&
, "never" use single-&
inif
-conditionals unless it is wrapped in an aggregating function such asany
orall
. The differences between&
and&&
are more than just vectorized-vs-nonvectorized, see Boolean operators && and ||. Further, I think you mean to use "OR" here instead of "AND", since if either one of them is null then you should not be attempting to use the operator.Change from 2-args to 3-args, separating the operator from the second operand.
Try this:
fun <- function(mydata, sel_var, sel_op, sel_val = NULL) {
if (is.null(sel_var) || is.null(sel_op)) return(mydata)
if (is.character(sel_op)) sel_op <- match.fun(sel_op)
mydata[do.call(sel_op, c(list(mydata[[sel_var]]), if (!is.null(sel_val)) list(sel_val))),]
}
fun(mtcars, "cyl", "<", 5)
fun(mtcars, "cyl", "%in%", c(4, 8))
fun(mtcars, "vs", "!")
Notes:
sel_op
can be a function or a string representing one. This gives a lot more flexibility, such as the ability to dofun(mtcars, "vs", Negate("!")) fun(mtcars, "vs", function(z) !!z)
the
c(list(..), list(if (!is.null(sel_val)) ...))
is meant to allowsel_val
to be empty/NULL
for unary functions.