Home > OS >  How to use multiple keywords in grepl
How to use multiple keywords in grepl

Time:11-11

Here is a vector of type string:

a<-c("Recherche impliquant la personne humaine (RIPH) Médicaments 3",
 "Recherche impliquant la personne humaine (RIPH) Hors Produits de santé 3",
 "Recherche impliquant la personne humaine (RIPH) dispositif médical 1")

I want to identify all element containing some keywords:

I firstly identify all element containing the word "Recherche"

grepl("recherche",a,ignore.case = TRUE)

[1] TRUE TRUE TRUE

Now I want to identify only elements containing all these keywords at the same time:

c("recherche", "impliquant", "personne", "humaine", "3")

The result must be

[1] TRUE TRUE FALSE

I tried this:

grepl(c("Recherche,impliquant , personne, humaine, 3"),a)

but it didn't work, cause the output is that:

FALSE FALSE FALSE

CodePudding user response:

You can do it using multiple lookaheads (?=...), where each lookahead asserts the presence anywhere in the string of a keyword; (?i) is used to make the matching case-insensitive:

grep("(?i)(?=.*recherche.*)(?=.*impliquant.*)(?=.*personne.*)(?=.*humaine.*)(?=.*3.*).*", 
 a,
 value = TRUE,
 perl = TRUE) 
[1] "Recherche impliquant la personne humaine (RIPH) Médicaments 3"           
[2] "Recherche impliquant la personne humaine (RIPH) Hors Produits de santé 3"

This method obviously also works with grepl; just omit `value = TRUE:

grepl("(?i)(?=.*recherche.*)(?=.*impliquant.*)(?=.*personne.*)(?=.*humaine.*)(?=.*3.*).*", 
     a,
     perl = TRUE) 
[1]  TRUE  TRUE FALSE
  • Related