Home > Software design >  Bringing a list of remaining items from list using REGEXMATCH - problem with sign
Bringing a list of remaining items from list using REGEXMATCH - problem with sign

Time:01-01

I have a list of email addresses ("All emails"), I have a second list of emails that I've contacted ("Contacted") and I want to create a third list of the ones who exist on the first list but not on the second ("To contact"). I use REGEXMATCH to bring the list of remaining email and it works great:

=ARRAYFORMULA(FILTER(A2:A,NOT(REGEXMATCH(A2:A,TEXTJOIN("|",TRUE,B2:B)))))

The problem is that emails that contain the sign aren't recognized as used and are added to the "To contact" list although they appear in the "Contacted" list.

For example, [email protected] will not be brought to the third list (as expected), but john [email protected] will be added to the list although he was actually contacted.

Here's an example, Col A is "All emails" Col B is "Contacted" Col C is "To contact" and I marked the problematic emails yellow: https://docs.google.com/spreadsheets/d/1Fesr5tjjMhruP1otVHfQXvwYtmbz2nhT4THjZulHNfw/edit#gid=0

CodePudding user response:

You need to escape special regex metacharacters:

=ARRAYFORMULA(FILTER(A2:A,NOT(REGEXMATCH(A2:A,TEXTJOIN("|",TRUE,REGEXREPLACE(B2:B, "([[ $^*()\\{?.|])", "\\$1"))))))

With REGEXREPLACE(B2:B, "([[ $^*()\\{?.|])", "\\$1"), you match and capture [, , $, ^, *, (, ), \, {, ?, . and | with a backslash and they are matched as literal chars.

To find out what chars are special in RE2 regex flavor, see the RE2 regex library documentation.

Technincally, ([[ $^*()\\{?.|]) is a regular expression that matches and captures into Group 1 a single char from the specified set as (...) defines a capturing group. The REGEXREPLACE command I used above replaces every found match with \\$1, i.e. a literal backslash and the value of Group 1. Namely, this means we prepend each found match with a single literal backslash. The backslash in the replacement pattern must be doubled as it is a special char even in the replacement pattern (it can escape the $ char if you need to use it as a literal char in the replacement pattern).

CodePudding user response:

You could also create a function to substitute the ' ' symbol for another not being used as a special character with RegExmatch .

  • Related