I have a data.table
which looks something like this (showing just few columns out of many) -
Id | Period | Product |
---|---|---|
1000797366 | 2018-Q1 | UG10000-WISD |
1000797366 | 2018-Q1 | NX11100, UG10000-WISD, UG12210 |
1000797366 | 2018-Q1 | UG10000-WISD, UG12210 |
1000797366 | 2018-Q1 | UG10000-WISD, UG12210 |
1000797366 | 2018-Q1 | UG12210 |
1000797366 | 2018-Q1 | NX11100 |
1000797366 | 2018-Q1 | NX11100 |
Here the column "Product" is of type list()
as I've to keep it this way for some later usage.
But I am facing a problem while filtering the rows based on a condition on Product column.
What I want is to filter all rows where value of product can be any of this vector c("UG12210","UG10000-WISD") along-with other filer such as Period in c("2018-Q1").
So my output should look something like this -
Id | Period | Product |
---|---|---|
1000797366 | 2018-Q1 | UG10000-WISD |
1000797366 | 2018-Q1 | NX11100, UG10000-WISD, UG12210 |
1000797366 | 2018-Q1 | UG10000-WISD, UG12210 |
1000797366 | 2018-Q1 | UG10000-WISD, UG12210 |
1000797366 | 2018-Q1 | UG12210 |
But somehow this is not happening, I tried following conditions but none worked.
data_test[Period %in% c("2018-Q1") & is.element("UG12210",Product),]
data_test[Period %in% c("2018-Q1") & Product %in% c("UG12210"),]
Any leads on how it can be achieved will be of great help. Thanks!
Below is the data using dput() for the datatable
structure(
list(
Id = c("1000797366", "1000797366", "1000797366", "1000797366", "1000797366", "1000797366", "1000797366"),
Period = c("2018-Q1", "2018-Q1", "2018-Q1", "2018-Q1", "2018-Q1", "2018-Q1", "2018-Q1"),
Product = list("UG10000-WISD", c("NX11100", "UG10000-WISD", "UG12210"), c("UG10000-WISD", "UG12210"),
c("UG10000-WISD", "UG12210"), "UG12210", "NX11100", "NX11100")
),
row.names = c(NA,-7L),
class = c("data.table", "data.frame"),
.internal.selfref = < pointer:0x562f66275020 >
)
CodePudding user response:
You can use sapply
function to check if any of the values in vals
is in Product
for each row:
vals = c("UG12210","UG10000-WISD")
dt[Period %chin% "2018-Q1" & sapply(Product, function(v) any(vals %chin% v))]
# Id Period Product
# 1: 1000797366 2018-Q1 UG10000-WISD
# 2: 1000797366 2018-Q1 NX11100,UG10000-WISD,UG12210
# 3: 1000797366 2018-Q1 UG10000-WISD,UG12210
# 4: 1000797366 2018-Q1 UG10000-WISD,UG12210
# 5: 1000797366 2018-Q1 UG12210
CodePudding user response:
We can loop over the list
with lapply
, check if there are any
values from the vector
('v1') in each of the list elements and subset
library(data.table)
v1 <- c("UG12210","UG10000-WISD")
dt1[Period %chin% c("2018-Q1") &
unlist(lapply(Product, function(x) any(v1 %chin% x)))]
-output
Id Period Product
<char> <char> <list>
1: 1000797366 2018-Q1 UG10000-WISD
2: 1000797366 2018-Q1 NX11100,UG10000-WISD,UG12210
3: 1000797366 2018-Q1 UG10000-WISD,UG12210
4: 1000797366 2018-Q1 UG10000-WISD,UG12210
5: 1000797366 2018-Q1 UG12210