Home > Enterprise >  How to Filter Data Table Rows with condition on column of Type list() in R
How to Filter Data Table Rows with condition on column of Type list() in R

Time:07-05

I have a data.table which looks something like this (showing just few columns out of many) -

Id Period Product
1000797366 2018-Q1 UG10000-WISD
1000797366 2018-Q1 NX11100, UG10000-WISD, UG12210
1000797366 2018-Q1 UG10000-WISD, UG12210
1000797366 2018-Q1 UG10000-WISD, UG12210
1000797366 2018-Q1 UG12210
1000797366 2018-Q1 NX11100
1000797366 2018-Q1 NX11100

Here the column "Product" is of type list() as I've to keep it this way for some later usage.

But I am facing a problem while filtering the rows based on a condition on Product column.

What I want is to filter all rows where value of product can be any of this vector c("UG12210","UG10000-WISD") along-with other filer such as Period in c("2018-Q1").

So my output should look something like this -

Id Period Product
1000797366 2018-Q1 UG10000-WISD
1000797366 2018-Q1 NX11100, UG10000-WISD, UG12210
1000797366 2018-Q1 UG10000-WISD, UG12210
1000797366 2018-Q1 UG10000-WISD, UG12210
1000797366 2018-Q1 UG12210

But somehow this is not happening, I tried following conditions but none worked.

data_test[Period %in% c("2018-Q1") & is.element("UG12210",Product),]

data_test[Period %in% c("2018-Q1") & Product %in% c("UG12210"),]

Any leads on how it can be achieved will be of great help. Thanks!

Below is the data using dput() for the datatable

structure(
  list(
    Id = c("1000797366", "1000797366", "1000797366", "1000797366", "1000797366", "1000797366", "1000797366"),
    Period = c("2018-Q1", "2018-Q1", "2018-Q1", "2018-Q1", "2018-Q1", "2018-Q1", "2018-Q1"),
    Product = list("UG10000-WISD", c("NX11100", "UG10000-WISD", "UG12210"), c("UG10000-WISD", "UG12210"),
      c("UG10000-WISD", "UG12210"), "UG12210", "NX11100", "NX11100")
  ),
  row.names = c(NA,-7L),
  class = c("data.table", "data.frame"),
  .internal.selfref = < pointer:0x562f66275020 >
)

CodePudding user response:

You can use sapply function to check if any of the values in vals is in Product for each row:

vals = c("UG12210","UG10000-WISD")

dt[Period %chin% "2018-Q1" & sapply(Product, function(v) any(vals %chin% v))]

#            Id  Period                      Product
# 1: 1000797366 2018-Q1                 UG10000-WISD
# 2: 1000797366 2018-Q1 NX11100,UG10000-WISD,UG12210
# 3: 1000797366 2018-Q1         UG10000-WISD,UG12210
# 4: 1000797366 2018-Q1         UG10000-WISD,UG12210
# 5: 1000797366 2018-Q1                      UG12210

CodePudding user response:

We can loop over the list with lapply, check if there are any values from the vector ('v1') in each of the list elements and subset

library(data.table)
v1 <- c("UG12210","UG10000-WISD")
dt1[Period %chin% c("2018-Q1") & 
      unlist(lapply(Product, function(x) any(v1 %chin% x)))]

-output

          Id  Period                      Product
       <char>  <char>                       <list>
1: 1000797366 2018-Q1                 UG10000-WISD
2: 1000797366 2018-Q1 NX11100,UG10000-WISD,UG12210
3: 1000797366 2018-Q1         UG10000-WISD,UG12210
4: 1000797366 2018-Q1         UG10000-WISD,UG12210
5: 1000797366 2018-Q1                      UG12210
  • Related