Home > Blockchain >  How can I subset or filter data using an arbitrary set of criteria?
How can I subset or filter data using an arbitrary set of criteria?

Time:09-25

I am writing a shiny app, but I can't wrap my head around this one and I am hoping you can help.

Say the user imports in some factors and some data:

data<-read.table(text="Var1 Var2 Var3 Var4 Data
             1 1 1 1 25
             1 1 1 1 15
             1 1 1 2 10
             1 1 1 2 11
             1 1 2 1 30
             1 1 2 1 32
             1 1 2 2 120
             1 1 2 2 123
             1 2 1 1 50
             1 2 1 1 52
             1 2 1 2 100
             1 2 1 2 101
             1 2 2 1 150
             1 2 2 1 152
             1 2 2 2 160
             1 2 2 2 162
             2 1 1 1 5
             2 1 1 1 4
             2 1 1 2 60
             2 1 1 2 62
             2 1 2 1 40
             2 1 2 1 42
             2 1 2 2 130
             2 1 2 2 132
             2 2 1 1 70
             2 2 1 1 72
             2 2 1 2 80
             2 2 1 2 82
             2 2 2 1 90
             2 2 2 1 92
             2 2 2 2 110
             2 2 2 2 111",
             header=T)

This data could be anything, any length or number of variables, some number of which might be factors.

I know how to subset out the data for analytics if I want, say, Var1=1 and Var2=2:

data[,5][which(data[,1]==1 & data[,2]==2)]

or filter

filter(data,data[,1]==1 & data[,2]==2)[,5]

Now say I want to change it to the data for Var1=1, Var2=2, and Var3=2. If I were doing this in script, I could just type another criterion in the filter or the subset. But how would I set it up in the app to be able to filter or subset on an arbitrary number of factors? (Ideally, with base or dplyr.) In the app I can pass in the column numbers for the factors and the data, I just don't know how to set it up to do that regardless of the number of columns the user has selected as factors.

Thanks!

edit....

Answer!

@akrun's excellent answer below, generalized for people like myself! :

factors=c(1,2,3,4)
levels_selected=c(1,1,2,1)


data[rowSums(levels_selected[col(data[factors])]==data[factors])==length(levels_selected),]

CodePudding user response:

If we have multiple columns to filter with same value, use if_all

library(dplyr)
data %>%
    filter(Var1 == 1 & if_all(c(Var2, Var3), `==`, 2))

-output

   Var1 Var2 Var3 Var4 Data
1    1    2    2    1  150
2    1    2    2    1  152
3    1    2    2    2  160
4    1    2    2    2  162

Also, if these are values specific to certain columns, subset the columns, create a logical vector with rowSums after replicating the input values

data[rowSums(c(1, 2, 2)[col(data[1:3])] == data[1:3]) == 3,]
   Var1 Var2 Var3 Var4 Data
13    1    2    2    1  150
14    1    2    2    1  152
15    1    2    2    2  160
16    1    2    2    2  162
  •  Tags:  
  • r
  • Related