Suppose I have the following objects in the memory:
ab
ab_b
ab_pm
ab_pn
c1_ab_b
and I only want to keep ab_pm
and ab_pn
.
I tried to use negative lookahead in ls()
to list ab
, ab_b
and c1_ab_b
for removal:
rm(list = ls(pattern = "ab_?(?!p)")
However, I got the error:
Error in grep(pattern, all.names, value = TRUE) :
invalid regular expression 'ab_?(?!p)', reason 'Invalid regexp'
I tried my regex at regex101.com, and found it matched all five object names, which suggested my regex was not "invalid", although it did not do what I wanted. My questions are:
- Does
ls()
in R support negative lookahead? I knowgrep()
needsperl = TRUE
to support it, but do not see a similar argument in thels()
help documentation. - How to correctly select the three objects I wanted to remove?
CodePudding user response:
Your ab_?(?!p)
PCRE regex does not match as expected because of backtracking. It matches ab
, then it matches an optional _
and then tries the negative lookaround. When the lookaround finds p
backtracking occurrs, and the lookahead is triggered again right before _
. Since _
is not p
, a match is returned.
The correct PCRE regex would be ab(?!_?p)
, see the regex demo. After matching b
, the regex engine tries the lookahead pattern only once, and if it fails to match an optional _
followed with a p
, the whole match will fail.
ls
does not support perl=TRUE
, so it only supports the default TRE regex library that does not support lookarounds.
You may use
ab([^_]p|_[^p]|.?$)
See the regex demo. Details:
ab
-ab
text([^_]p|_[^p]|.?$)
- either of the three alternatives:[^_]p
- any char but_
and thenp
|
- or_[^p]
- a_
and then any char butp
|
- or.?$
- any one optional char and then end of string.
CodePudding user response:
ls
uses grep(pattern, all.names, value = TRUE)
, so it does not support perl extensions including lookahead. You can handle that externally, though, by wrapping ls
in grep
:
vec <- ls(pattern = "^ab_")
# vec <- c("ab","ab_b","ab_pm","ab_pn","c1_ab_b")
grep("ab_(?=p)", vec, perl = TRUE, value = TRUE)
# [1] "ab_pm" "ab_pn"
So perhaps a one-liner:
grep("ab_(?=p)", ls(pattern = "^ab_"), value = TRUE, perl = TRUE)
This does a double-grep
(once inside ls
, once outside); one can always just make it a little more direct with
grep("ab_(?=p)", ls(), value = TRUE, perl = TRUE)