Home > Blockchain >  How to specify the operator in my function
How to specify the operator in my function

Time:10-28

Here is my dataset

mydata<-data.frame(
  id=1:20,
  sex=sample(c(rep("M",6),rep("F",14))),
  Age=round(rnorm(20, 30,2)),
  Weight=round(rnorm(20, 65,5),2)
)

I want my function to allow me to specify on which variable I want to do the filtering but also the criterion, i.e. the operator (== or > or <=...) and the value (M or 65...)

This is the function I am trying to create. I know in advance that it won't work, it's to give an idea of what I want to create.

If I don't put the variable, value and operator of selection my function must return the original database otherwise the filtered database

    my_func<-function(select_var, select_crit){
      
      mydata<-mydata<-if(is.null(select_var)&is.null(select_crit)){mydata}else{
        mydata[ which(mydata[select_var]select_crit), ]
      }
return(mydata)
    }

For example I want to be able to select all the male with my function like this

my_func(select_var="sex",select_crit="M"),

And all the induvidual > 30 (in age) like this:

my_func(select_var="Age",select_crit=">30")

or to select with the operator %in%

my_func(select_var="Age",select_crit=%in%c(30:40))

CodePudding user response:

You have to add a data argument inside your function and apply a combination of eval, parse and paste0 for building your filter (row selection) criterion. This approach will help:

my_func <- function(data, select_var=NULL, select_crit=NULL){
  
  if(is.null(select_var) & is.null(select_crit)){
    output <- data
  } else {
    output <- data[eval(parse(text=paste0("data", "$",select_var, select_crit))), select_var, drop=FALSE]
  }
  
  return(output)
}

Examples:

> my_func(mydata, select_var="Age", select_crit=">30")
   Age
1   32
5   32
7   33
8   31
9   33
13  31
16  33
18  32
19  32
> my_func(mydata, select_var="Age",select_crit="%in%c(30:40)")
   Age
1   32
2   30
5   32
7   33
8   31
9   33
11  30
13  31
14  30
16  33
17  30
18  32
19  32

Calling my_func(data) with select_var and select_crit with defult NULL will return your original dataset.

CodePudding user response:

Three suggestions:

  1. Make the data an argument of the function and not accessed via scope-breach. This helped with reproducibility, troubleshooting, maintenance, etc, and as a side-effect will allow your function to operate in %>%- and |>-pipes (if so desired).

  2. Use &&, "never" use single-& in if-conditionals unless it is wrapped in an aggregating function such as any or all. The differences between & and && are more than just vectorized-vs-nonvectorized, see Boolean operators && and ||. Further, I think you mean to use "OR" here instead of "AND", since if either one of them is null then you should not be attempting to use the operator.

  3. Change from 2-args to 3-args, separating the operator from the second operand.

Try this:

fun <- function(mydata, sel_var, sel_op, sel_val = NULL) {
  if (is.null(sel_var) || is.null(sel_op)) return(mydata)
  if (is.character(sel_op)) sel_op <- match.fun(sel_op)
  mydata[do.call(sel_op, c(list(mydata[[sel_var]]), if (!is.null(sel_val)) list(sel_val))),]
}

fun(mtcars, "cyl", "<", 5)
fun(mtcars, "cyl", "%in%", c(4, 8))
fun(mtcars, "vs", "!")

Notes:

  • sel_op can be a function or a string representing one. This gives a lot more flexibility, such as the ability to do

    fun(mtcars, "vs", Negate("!"))
    fun(mtcars, "vs", function(z) !!z)
    
  • the c(list(..), list(if (!is.null(sel_val)) ...)) is meant to allow sel_val to be empty/NULL for unary functions.

  • Related