library(data.table)
dat1 <- data.table(id1 = c(1, 1, 2),
pattern = c("apple", "applejack", "bananas, sweet"))
dat2 <- data.table(id2 = c(1174, 1231),
description = c("apple is sweet", "apple is a computer"),
memo = c("bananas, sweet yes", "bananas, sweetyes"))
> dat1
id1 pattern
1: 1 apple
2: 1 applejack
3: 2 bananas, sweet
> dat2
id2 description memo
1: 1174 apple is sweet bananas, sweet yes
2: 1231 apple is a computer bananas, sweetyes
I have two data.table
s, dat1
and dat2
. I want to search for each pattern
in dat2
against the description
and memo
columns in dat2
and store the corresponding id2
s.
The final output table should look something like this:
id1 pattern description_match memo_match
1: 1 apple 1174,1231 <NA>
2: 1 applejack <NA> <NA>
3: 2 bananas, sweet <NA> 1174
The regular expression I want to use is \\b[pattern]\\b
. Below is my attempt:
dat1[, description_match := dat2[grepl(paste0("\\b", dat1$pattern, "\\b"), dat2$description), .(id2 = paste(id2, collapse = ","))]]
dat1[, memo_match := dat2[grepl(paste0("\\b", dat1$pattern, "\\b"), dat2$memo), .(id2 = paste(id2, collapse = ","))]]
However, both give me the error that grepl
can only use the first pattern
.
CodePudding user response:
We group by row sequence, create the match columns from 'dat2', by paste
ing the 'id2' extracted from the logical output from grepl
library(data.table)
dat1[, c("description_match", "memo_match") := {
pat <- sprintf('\\b(%s)\\b', paste(pattern, collapse = "|"))
.(toString(dat2$id2[grepl(pat, dat2$description)]),
toString(dat2$id2[grepl(pat, dat2$memo)]))
}, seq_along(id1)]
dplyr::na_if(dat1, "")
id1 pattern description_match memo_match
<num> <char> <char> <char>
1: 1 apple 1174, 1231 <NA>
2: 1 applejack <NA> <NA>
3: 2 bananas, sweet <NA> 1174
According to ?sprintf
The string fmt contains normal characters, which are passed through to the output string, and also conversion specifications which operate on the arguments provided through .... The allowed conversion specifications start with a % and end with one of the letters in the set aAdifeEgGosxX%
s - Character string. Character NAs are converted to "NA".
Or use a for
loop
for(nm in names(dat2)[-1])
dat1[, paste0(nm, "_match") :=
toString(dat2$id2[grepl(paste0("\\b", pattern, "\\b"),
dat2[[nm]])]), seq_along(id1)][]