I am currently trying to index rows that should and shouldn't be kept based on whether the regex-values in the pattern-column exists in the description-column, in the data below.
data <- data.frame(id = c(1,2,2,3,3,4),
old_levels = c(0,1,1,1,1,2),
levels = c(1,2,3,2,3,4),
description = c("vegetable", "fruit", "fruit",
"meat", "meat", "soda"),
pattern = c("vegetable",
"fruit",
"?!(vegetable|fruit)",
"fruit",
"?!(vegetable|fruit)",
NA))
Using dplyr
I figured that the below example should work:
data %>% rowwise() %>% mutate(matches = grepl(pattern, description))
However, this yields:
# A tibble: 6 x 6
# Rowwise:
id old_levels levels description pattern matches
<dbl> <dbl> <dbl> <chr> <chr> <lgl>
1 1 0 1 vegetable vegetable TRUE
2 2 1 2 fruit fruit TRUE
3 2 1 3 fruit ?!(vegetable|fruit) FALSE
4 3 1 2 meat fruit FALSE
5 3 1 3 meat ?!(vegetable|fruit) FALSE
6 4 2 4 soda NA NA
The NA
is expected and is working as intended, however I'm struggling to get the negative lookahead to work, as matches
in row 5 should be TRUE...
Any help would be appreciated!
CodePudding user response:
The lookahead syntax is (?!...)
, not ?!(...)
.
Besides, grepl
with the default TRE library does not support lookarounds, you need to pass perl=TRUE
.
You can use
data <- data.frame(id = c(1,2,2,3,3,4),
old_levels = c(0,1,1,1,1,2),
levels = c(1,2,3,2,3,4),
description = c("vegetable", "fruit", "fruit",
"meat", "meat", "soda"),
pattern = c("vegetable",
"fruit",
"^(?!.*(?:vegetable|fruit))",
"fruit",
"^(?!.*(?:vegetable|fruit))",
NA))
data %>% rowwise() %>% mutate(matches = grepl(pattern, description, perl=TRUE))
Output:
> data %>% rowwise() %>% mutate(matches = grepl(pattern, description, perl=TRUE))
# A tibble: 6 x 6
# Rowwise:
id old_levels levels description pattern matches
<dbl> <dbl> <dbl> <chr> <chr> <lgl>
1 1 0 1 vegetable vegetable TRUE
2 2 1 2 fruit fruit TRUE
3 2 1 3 fruit ^(?!.*(?:vegetable|fruit)) FALSE
4 3 1 2 meat fruit FALSE
5 3 1 3 meat ^(?!.*(?:vegetable|fruit)) TRUE
6 4 2 4 soda <NA> NA