Home > Software engineering >  Using negative lookahead in ls(pattern = "") in R
Using negative lookahead in ls(pattern = "") in R

Time:11-10

Suppose I have the following objects in the memory:

ab
ab_b
ab_pm
ab_pn
c1_ab_b

and I only want to keep ab_pm and ab_pn.

I tried to use negative lookahead in ls() to list ab, ab_b and c1_ab_b for removal:

rm(list = ls(pattern = "ab_?(?!p)")

However, I got the error:

Error in grep(pattern, all.names, value = TRUE) :
  invalid regular expression 'ab_?(?!p)', reason 'Invalid regexp'

I tried my regex at regex101.com, and found it matched all five object names, which suggested my regex was not "invalid", although it did not do what I wanted. My questions are:

  1. Does ls() in R support negative lookahead? I know grep() needs perl = TRUE to support it, but do not see a similar argument in the ls() help documentation.
  2. How to correctly select the three objects I wanted to remove?

CodePudding user response:

Your ab_?(?!p) PCRE regex does not match as expected because of backtracking. It matches ab, then it matches an optional _ and then tries the negative lookaround. When the lookaround finds p backtracking occurrs, and the lookahead is triggered again right before _. Since _ is not p, a match is returned.

The correct PCRE regex would be ab(?!_?p), see the regex demo. After matching b, the regex engine tries the lookahead pattern only once, and if it fails to match an optional _ followed with a p, the whole match will fail.

ls does not support perl=TRUE, so it only supports the default TRE regex library that does not support lookarounds.

You may use

ab([^_]p|_[^p]|.?$)

See the regex demo. Details:

  • ab - ab text
  • ([^_]p|_[^p]|.?$) - either of the three alternatives:
    • [^_]p - any char but _ and then p
    • | - or
    • _[^p] - a _ and then any char but p
    • | - or
    • .?$ - any one optional char and then end of string.

CodePudding user response:

ls uses grep(pattern, all.names, value = TRUE), so it does not support perl extensions including lookahead. You can handle that externally, though, by wrapping ls in grep:

vec <- ls(pattern = "^ab_")
# vec <- c("ab","ab_b","ab_pm","ab_pn","c1_ab_b")
grep("ab_(?=p)", vec, perl = TRUE, value = TRUE)
# [1] "ab_pm" "ab_pn"

So perhaps a one-liner:

grep("ab_(?=p)", ls(pattern = "^ab_"), value = TRUE, perl = TRUE)

This does a double-grep (once inside ls, once outside); one can always just make it a little more direct with

grep("ab_(?=p)", ls(), value = TRUE, perl = TRUE)
  • Related