I have a table with description of symptoms like below:
DT <- data.table(no = c(1, 2, 3),
symptom = c("headache and numbness", "tachycardia, sometimes headahce", "breath difficulty with limb numbness"))
the keywords I'm focusing on look like this
key.word <- list(
head = c("head", "headache"),
chest = c("breath", "tachycardia", "palpitaion")
I want to add two columns that describe whether the keyword is mentioned in the variable symptom, the desirable result looks like this
result <- data.table(no = c(1, 2, 3),
symptom = c("headache and numbness", "tachycardia, sometimes headahce", "breath difficulty with limb numbness"),
head = c(T, T, F),
chest = c(F, T, T))
I can do this job by
DT[symptom %like% paste0(head, collapse = "|"), head := T]
DT[symptom %like% paste0(chest, collapse = "|"), chest := T]
But I'm wondering if there is a way to do this with lapply and datatable? (which seemed to be more elegant). Thanks in advance!
CodePudding user response:
Here is a data.table
option
DT[ , lapply(
key.word, function(x) any(sapply(x, function(w) grepl(w, symptom)))),
by = list(no, symptom)]
# no symptom head chest
#1: 1 headache and numbness TRUE FALSE
#2: 2 tachycardia, sometimes headahce TRUE TRUE
#3: 3 breath difficulty with limb numbness FALSE TRUE
The internal sapply
loop is necessary as pattern
in grepl
is not vectorised.
CodePudding user response:
Not sure if you are looking only for DT or lapply option, but using dplyr and stringr:
DT %>% mutate(head = str_detect(symptom, str_c(key.word[[1]], collapse = '|')),
chest =str_detect(symptom, str_c(key.word[[2]], collapse = '|')))
no symptom head chest
1: 1 headache and numbness TRUE FALSE
2: 2 tachycardia, sometimes headahce TRUE TRUE
3: 3 breath difficulty with limb numbness FALSE TRUE