Home > Net >  R function returns data when certain input is FALSE, returns nothing when TRUE
R function returns data when certain input is FALSE, returns nothing when TRUE

Time:12-20

I've created a function to retrieve information from a data.table that meets a criteria. The function requires a TRUE or FALSE input to get only rows with a certain value, or all rows without that value.

In this reproducible example, I want to retrieve flowers with a given sepal length. Sometimes I only want virginica flowers, sometimes I want every other flower.

# Create a data.table from iris
iris_dt <- as.data.table(iris)

# Create function that lets you select sepal length and
# whether you want only virginica or anything but virginica 
select_virginica <- function (sep_l,virg){
  data_flowers <- iris_dt[which(Sepal.Length==sep_l)] 
  
  if (virg==TRUE){
    data_flowers[which (Species=="virginica")]}
  
  if (virg==FALSE){
    data_flowers[which (Species!="virginica")]}
}

# Check all non-virginica with sepal length = 5.8
select_virginica(5.8,virg=F)
# This correctly returns information on 4 flowers

# Check all virginica with sepal length = 5.8
select_virginica(5.8,virg=T)
# This incorrectly returns nothing. How do I fix this?

When it returns correctly (because I inputted FALSE), I can further select data from the results easily.

# I can work with the results in an interesting way, and easily select data from it
select_virginica(5.8,virg=F)[which (Petal.Length == 1.2)]
select_virginica(5.8,virg=F)[,3]

Some diagnostics and fix attempts:

It's not a problem in the dataset, because there are suitable rows with all three species

# There are flowers of Sepal.Length==5.8 in all three species
subset(iris_dt,Sepal.Length==5.8)

The function uses which after the dataset, but the issue is identical even if I use subset instead.

If I add %>% print() after "virginica")] I can get it to print, but then I can't use the function in that interesting way.

Why is this happening? How can I get the function to return my desired rows, and to be able to select rows in that interesting way? Thanks.

CodePudding user response:

Instead of doing two if's, use an if/else

select_virginica <- function (sep_l,virg){
  data_flowers <- iris_dt[which(Sepal.Length==sep_l)] 
  
  if (virg==TRUE) {
    data_flowers[which (Species=="virginica")]
  } else {
    data_flowers[which (Species!="virginica")]}
  }
}

If you don't have a return() statement in your function, R will return the last expression evaluated in your function body. If you do if(virg==FALSE) that's a stamement and since you didn't give an else, the if will return a NULL value implicitly. By using if/else, the function will return one of the two cases.

CodePudding user response:

If you don't use return() explicitly then your function will return the last evaluated line. The way you've implemented your function, you need to use return(). Like this:

select_virginica <- function (sep_l,virg){
  data_flowers <- iris_dt[which(Sepal.Length==sep_l)] 
  
  if (virg==TRUE){
    return(data_flowers[which (Species=="virginica")])
  }
  
  if (virg==FALSE){
    return(data_flowers[which (Species!="virginica")])
  }
}

Or you could rewrite it so that the last line is always the appropriate value to return:

select_virginica <- function (sep_l,virg){
  data_flowers <- iris_dt[which(Sepal.Length==sep_l)] 
  
  if (virg==TRUE){
    data_flowers <- data_flowers[which (Species=="virginica")]}
  
  if (virg==FALSE){
    data_flowers <- data_flowers[which (Species!="virginica")]}

  data_flowers
}

As a side-note, ==TRUE and ==FALSE aren't the best ways to check if a value is TRUE or FALSE. Stylistically, I'd prefer if(virg) and if(!virg), but if you want to be more explicit use isTRUE() and isFALSE(), it will be more robust to NA values and different classes and such.

  •  Tags:  
  • r
  • Related