I would like to capture all mentions of "pensions" (capital-insensitive, including pensions, pensioners, but excluding unrelated words like "suspension." However, I would like to exclude pensions when they are preceded by "Department of Work and "; but I can't manage to capture the whole expression. So far I have:
sentences <- c("department of work and pensions", "and pensioners", "pensioners", "Pensions", "suspension")
try <- grepl("(?<!department of work and )^pension*", ignore.case = T, perl = T, sentences)
try
Any advice?
CodePudding user response:
grep('(?<!department of work and )\\bpension', sentences,
value = TRUE, ignore.case = TRUE, perl = TRUE)
[1] "and pensioners" "pensioners" "Pensions"
CodePudding user response:
You can use a single pattern that will account for any whitespaces between the words and also match pension
only at the word boundary:
sentences <- c("department of work and pensions", "and pensioners", "pensioners", "Pensions", "suspension")
grepl("\\bdepartment of work and \\w (*SKIP)(*F)|\\bpension", ignore.case = T, perl = T, sentences)
## => [1] FALSE TRUE TRUE TRUE FALSE
See the R demo and the regex demo.
Details:
\bdepartment of work and \w
- word boundary\b
,department of work and
space one or more word chars(*SKIP)(*F)
- omit all text matched so far and start the next match search from the failure position|
- or\bpension
- word boundary\b
and apension
substring.
CodePudding user response:
We may use
grepl("\\bpension\\S ", sentences, ignore.case = TRUE) &
!grepl("department of work .*\\bpension\\S ", sentences, ignore.case = TRUE)