I have a list of email addresses ("All emails"), I have a second list of emails that I've contacted ("Contacted") and I want to create a third list of the ones who exist on the first list but not on the second ("To contact"). I use REGEXMATCH to bring the list of remaining email and it works great:
=ARRAYFORMULA(FILTER(A2:A,NOT(REGEXMATCH(A2:A,TEXTJOIN("|",TRUE,B2:B)))))
The problem is that emails that contain the sign aren't recognized as used and are added to the "To contact" list although they appear in the "Contacted" list.
For example, [email protected] will not be brought to the third list (as expected), but john [email protected] will be added to the list although he was actually contacted.
Here's an example, Col A is "All emails" Col B is "Contacted" Col C is "To contact" and I marked the problematic emails yellow: https://docs.google.com/spreadsheets/d/1Fesr5tjjMhruP1otVHfQXvwYtmbz2nhT4THjZulHNfw/edit#gid=0
CodePudding user response:
You need to escape special regex metacharacters:
=ARRAYFORMULA(FILTER(A2:A,NOT(REGEXMATCH(A2:A,TEXTJOIN("|",TRUE,REGEXREPLACE(B2:B, "([[ $^*()\\{?.|])", "\\$1"))))))
With REGEXREPLACE(B2:B, "([[ $^*()\\{?.|])", "\\$1")
, you match and capture [
,
, $
, ^
, *
, (
, )
, \
, {
, ?
, .
and |
with a backslash and they are matched as literal chars.
To find out what chars are special in RE2 regex flavor, see the RE2 regex library documentation.
Technincally, ([[ $^*()\\{?.|])
is a regular expression that matches and captures into Group 1 a single char from the specified set as (...)
defines a capturing group. The REGEXREPLACE
command I used above replaces every found match with \\$1
, i.e. a literal backslash and the value of Group 1. Namely, this means we prepend each found match with a single literal backslash. The backslash in the replacement pattern must be doubled as it is a special char even in the replacement pattern (it can escape the $
char if you need to use it as a literal char in the replacement pattern).
CodePudding user response:
You could also create a function to substitute the ' ' symbol for another not being used as a special character with RegExmatch .