library(data.table)
tab1 <- data.table(id1 = c(1, 2, 100, 100),
class = c(2, 2, 1, 5),
number = c(96, 100, 55, 55),
code1 = c("123", "125", "999", "999"))
tab2 <- data.table(id2 = c(100, 200, 200, 205, 205, 205),
class = c(2, 2, 1, 5, 2, 2),
max_number = c(100, 80, 80, 95, 95, 95),
min_number = c(0, 50, 50, 10, 10, 10),
code2 = c("123,999", "100,105", "100,105", "555,999", "234,999", "123,999,555"))
> tab1
id1 class number code1
1: 1 2 96 123
2: 2 2 100 125
3: 100 1 55 999
4: 100 5 55 999
> tab2
id2 class max_number min_number code2
1: 100 2 100 0 123,999
2: 200 2 80 50 100,105
3: 200 1 80 50 100,105
4: 205 5 95 10 555,999
5: 205 2 95 10 234,999
6: 205 2 95 10 123,999,555
I have two tables. For each id1
, I would like to match on class
, min_number <= number <= max_number
and see if code1
matches anything (partially) in code2
. That is, "123" would be a match for "123, 999, 555".
My attempt is below. The column V1
contains the id2
s that matched.
> tab2[tab1, on = c("class", "max_number >= number", "min_number <= number"), .(list(id2)), by = .EACHI]
class max_number min_number V1
1: 2 96 96 100
2: 2 100 100 100
3: 1 55 55 200
4: 5 55 55 205
Just matching on class
and min_number <= number <= max_number
works fine. But trying to do the partial string match with grepl
gives the following error:
> tab2[tab1, on = c("class", "max_number >= number", "min_number <= number", grepl("code2", "code1")), .(list(id2)), by = .EACHI]
Error in colnamesInt(x, names(on), check_dups = FALSE) :
argument specifying columns specify non existing column(s): cols[4]='FALSE'
The desired output is something like this (where V1
contains the id2
s that matches each row in tab1
):
id1 class max_number min_number V1
1: 1 2 96 96 100
2: 2 2 100 100 <NA>
3: 100 1 55 55 <NA>
4: 100 5 55 55 205,205
CodePudding user response:
You can't use grepl
in the join condition, but you could calculate it using mapply
:
tab2[tab1, on = c("class", "max_number >= number", "min_number <= number"), .(code1,code2,id1,id2,select = mapply(grepl,code1,code2)), by = .EACHI][
,.(list(fifelse(select,id2,NA))),by=.(id1,class,max_number,min_number)]
id1 class max_number min_number V1
<num> <num> <num> <num> <list>
1: 1 2 96 96 100
2: 2 2 100 100 NA
3: 100 1 55 55 NA
4: 100 5 55 55 205
CodePudding user response:
Probably I have to use strsplit
instead of grepl
here
tab2[,
.(code2 = unlist(strsplit(code2, ","))),
by = id2:min_number
][
tab1,
.(id1, class, min_number, max_number, id2),
on = c("class", "min_number <= number", "max_number >= number", "code2 == code1")
]
which gives
id1 class min_number max_number id2
1: 1 2 96 96 100
2: 2 2 100 100 NA
3: 100 1 55 55 NA
4: 100 5 55 55 205