I have to apply a long regex pattern to a long string. The regex pattern is something such:
seed(1234)
myFun <- function(n = 5000) {
a <- do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE))
paste0(a, sprintf("d", sample(9999, n, TRUE)), sample(LETTERS, n, TRUE))
}
long_regex <- paste0(myFun(1000), collapse = "|")
long_regex <- paste0("(", long_regex, ")")
However, gsub
can´t deal with such long patterns:
text <- "HPPIZ9166O BHVOF0473O LCVDO3833Z"
gsub(long_regex, "marker \\1;", text)
Error in gsub(long_regex, "marker \\1;", text) :
assertion 'tree->num_tags == num_tags' failed in executing regexp: file 'tre-compile.c',
line 634
How do I overcome this issue? Thank you.
CodePudding user response:
If your regexes are okay as perl regexes, the perl-compatible regex engine seems to cope:
> gsub(long_regex, "marker \\1;", text)
Error in gsub(long_regex, "marker \\1;", text) :
assertion 'tree->num_tags == num_tags' failed in executing regexp: file 'tre-compile.c', line 634
but...
> gsub(long_regex, "marker \\1;", text, perl=TRUE)
[1] "HPPIZ9166O BHVOF0473O LCVDO3833Z"
If I pick out one of the strings from the regex you can see the gsub works in this case:
> substr(long_regex,10000,10100)
[1] "|PZIFO9919X|VBICZ3063E|HZTGZ8881V|PUURO8525W|QLYMN6531U|KTUQZ7171V|GULUD6556Z|UMHSA7400F|DAYHH0017F|Q"
> text = "HZTGZ8881V nope "
> gsub(long_regex, "marker \\1;", text, perl=TRUE)
[1] "marker HZTGZ8881V; nope "