Home > OS >  Conditional filtering with data.table with multiple statements
Conditional filtering with data.table with multiple statements

Time:12-03

I would like to know if there is an elegant and concise way to do conditional filtering with data.table.

My aim is the following: if condition 1 is met, filter based on condition 2.

For instance, in the case of the iris dataset, how can I drop the observations among Species=="setosa" where Sepal.Length<5.5, while keeping all observations with Sepal.Length<5.5 for other species?

I know how to do this in steps, but I wonder if there is a better way to do it in a single liner

# this is how I would do it in steps. 

data("iris")

# first only select observations in setosa I am interested in keeping 
iris1<- setDT(iris)[Sepal.Length>=5.5&Species=="setosa"] 

# second, drop all of setosa observations. 
iris2<- setDT(iris)[Species!="setosa"] 

# join data,
iris_final<-full_join(iris1,iris2)

head(iris_final)
   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1:          5.8         4.0          1.2         0.2     setosa
2:          5.7         4.4          1.5         0.4     setosa
3:          5.7         3.8          1.7         0.3     setosa
4:          5.5         4.2          1.4         0.2     setosa
5:          5.5         3.5          1.3         0.2     setosa # only keeping setosa with Sepal.Length>=5.5. Note that for other species, Sepal.Length can be <5.5
6:          7.0         3.2          4.7         1.4 versicolor

is there a more concise and elegant way of doing this?

CodePudding user response:

Is something like the following what you are looking for? It is not very clear what you want.

library(data.table)

dt <- data.table(iris)
dt[Sepal.Length >= 5.5 & Species == "setosa" | Species != "setosa"]

#>      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#>   1:          5.8         4.0          1.2         0.2    setosa
#>   2:          5.7         4.4          1.5         0.4    setosa
#>   3:          5.7         3.8          1.7         0.3    setosa
#>   4:          5.5         4.2          1.4         0.2    setosa
#>   5:          5.5         3.5          1.3         0.2    setosa
#>  ---                                                            
#> 101:          6.7         3.0          5.2         2.3 virginica
#> 102:          6.3         2.5          5.0         1.9 virginica
#> 103:          6.5         3.0          5.2         2.0 virginica
#> 104:          6.2         3.4          5.4         2.3 virginica
#> 105:          5.9         3.0          5.1         1.8 virginica

CodePudding user response:

You can use the | or operator:

This is asking to remove any lines where Species=="setosa" & Sepal.Length<5.5 and keep lines where Sepal.Length>5.5

iris1[!(Species=="setosa" & Sepal.Length<5.5) | Sepal.Length>5.5]
     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
  1:          5.8         4.0          1.2         0.2    setosa
  2:          5.7         4.4          1.5         0.4    setosa
  3:          5.7         3.8          1.7         0.3    setosa
  4:          5.5         4.2          1.4         0.2    setosa
  5:          5.5         3.5          1.3         0.2    setosa
 ---                                                            
101:          6.7         3.0          5.2         2.3 virginica
102:          6.3         2.5          5.0         1.9 virginica
103:          6.5         3.0          5.2         2.0 virginica
104:          6.2         3.4          5.4         2.3 virginica
105:          5.9         3.0          5.1         1.8 virginica
  • Related