I've created a function to retrieve information from a data.table that meets a criteria. The function requires a TRUE or FALSE input to get only rows with a certain value, or all rows without that value.
In this reproducible example, I want to retrieve flowers with a given sepal length. Sometimes I only want virginica flowers, sometimes I want every other flower.
# Create a data.table from iris
iris_dt <- as.data.table(iris)
# Create function that lets you select sepal length and
# whether you want only virginica or anything but virginica
select_virginica <- function (sep_l,virg){
data_flowers <- iris_dt[which(Sepal.Length==sep_l)]
if (virg==TRUE){
data_flowers[which (Species=="virginica")]}
if (virg==FALSE){
data_flowers[which (Species!="virginica")]}
}
# Check all non-virginica with sepal length = 5.8
select_virginica(5.8,virg=F)
# This correctly returns information on 4 flowers
# Check all virginica with sepal length = 5.8
select_virginica(5.8,virg=T)
# This incorrectly returns nothing. How do I fix this?
When it returns correctly (because I inputted FALSE), I can further select data from the results easily.
# I can work with the results in an interesting way, and easily select data from it
select_virginica(5.8,virg=F)[which (Petal.Length == 1.2)]
select_virginica(5.8,virg=F)[,3]
Some diagnostics and fix attempts:
It's not a problem in the dataset, because there are suitable rows with all three species
# There are flowers of Sepal.Length==5.8 in all three species
subset(iris_dt,Sepal.Length==5.8)
The function uses which
after the dataset, but the issue is identical even if I use subset
instead.
If I add %>% print()
after "virginica")]
I can get it to print, but then I can't use the function in that interesting way.
Why is this happening? How can I get the function to return my desired rows, and to be able to select rows in that interesting way? Thanks.
CodePudding user response:
Instead of doing two if's, use an if/else
select_virginica <- function (sep_l,virg){
data_flowers <- iris_dt[which(Sepal.Length==sep_l)]
if (virg==TRUE) {
data_flowers[which (Species=="virginica")]
} else {
data_flowers[which (Species!="virginica")]}
}
}
If you don't have a return()
statement in your function, R will return the last expression evaluated in your function body. If you do if(virg==FALSE)
that's a stamement and since you didn't give an else
, the if
will return a NULL value implicitly. By using if/else, the function will return one of the two cases.
CodePudding user response:
If you don't use return()
explicitly then your function will return the last evaluated line. The way you've implemented your function, you need to use return()
. Like this:
select_virginica <- function (sep_l,virg){
data_flowers <- iris_dt[which(Sepal.Length==sep_l)]
if (virg==TRUE){
return(data_flowers[which (Species=="virginica")])
}
if (virg==FALSE){
return(data_flowers[which (Species!="virginica")])
}
}
Or you could rewrite it so that the last line is always the appropriate value to return:
select_virginica <- function (sep_l,virg){
data_flowers <- iris_dt[which(Sepal.Length==sep_l)]
if (virg==TRUE){
data_flowers <- data_flowers[which (Species=="virginica")]}
if (virg==FALSE){
data_flowers <- data_flowers[which (Species!="virginica")]}
data_flowers
}
As a side-note, ==TRUE
and ==FALSE
aren't the best ways to check if a value is TRUE or FALSE. Stylistically, I'd prefer if(virg)
and if(!virg)
, but if you want to be more explicit use isTRUE()
and isFALSE()
, it will be more robust to NA
values and different classes and such.