To illustrate with an example:
I have a few keywords (case sensitive).
kw <- c("American Express", "Inc said")
I have quite a few articles.
data("acq")
dv <- sapply(1:length(acq),function(x) acq[[x]]$content) #doing data transformation so that dv is just a vector of strings
I want the following table as an output
temp <- sapply(1:length(kw),function(x) stringr::str_detect(dv,kw[x]))
The problem is, I have millions of records and the method that I am using is not efficient enough.
CodePudding user response:
What about parallelizing? This is an example based on your code:
library(parallel)
n_cores <- 2 # number of cores for parallel processing
cl <- makeCluster(n_cores)
emp <- parSapply(cl, 1:length(acq), FUN=function(x,i) str_detect(acq[[x]]$content,kw[I]))
stopCluster(cl)