Home > Software design >  Google sheets regexmatch for 2 consecutive characters
Google sheets regexmatch for 2 consecutive characters

Time:11-11

I have this simple match in google app script that I'm trying to port over to google sheets. It just matches 2 consecutive letters with a space infront and a character in back.

big customers coming back slower, moved to incumbents cc.

this will work in GAS:

var match_ar = sentence.match(/\s([a-z])\1\S/)

I'm trying to get regexmatch to do the same thing

I've tried regexmatch(B6,"\s(([:alpha:])\1)"), regexmatch(B6,"\s([:alpha:])\1")

I'm getting an error Error Function REGEXMATCH parameter 2 value "\s(([:alpha:])\1)" is not a valid regular expression.

CodePudding user response:

Your regex itself is more or less correct, although I would rewrite more simply as (\w)\1\S, unless there's some reason you need nested capture groups.

Your real problem is that Google uses the RE2 regex engine, which, unlike PCRE, doesn't support backreferences at all AFAICT, so it's tripping up over the \1 token at the end of the expression - remove it and =REGEXMATCH(B6, " (\w)\S") evalutates just fine. Unfortunately, that makes it impossible as far as I know to match double letters without explicitly writing out every possible combination in the regex, which is a lot uglier:

 (?:aa|bb|cc|dd|ee|ff|gg|hh|ii|jj|kk|ll|mm|nn|oo|pp|qq|rr|ss|tt|uu|vv|ww|xx|yy|zz)\S

Or, I mean, if it's only the string " cc." you're trying to match, and not all double-lettered combinations patterned like that:

 cc\.

CodePudding user response:

Given that you are unlikely to have a word like "aardvark" or "llama" or "oogenisis" in your string, the character after a space-double-letter combo seems moot. In other words, finding space-double-letter should find only the matches you want without going further.

That said, try this:

=ArrayFormula(REGEXMATCH(B6,JOIN("|"," "&REPT(CHAR(96 SEQUENCE(26)),2))))

It is similar to what contributor "Arcaeca" offered, except that it builds out the long list of double-letter combos formulaically.

However, as "Arcaeca" noted, if you only ever need to find exactly cc, it's simple:

=REGEXMATCH(B6," cc")

(Again, any trailing period or other character is moot.)

  • Related