Home > other >  Trying to convert plural words to singular words using regex but want to ignore a few words
Trying to convert plural words to singular words using regex but want to ignore a few words

Time:01-11

I am currently trying to replace some of the plural words like removing "s" from "birds" and replacing it as "bird" in bigquery

but I want them to ignore a few words like "less", "james", "this".

I was able to come up with this which ignores the "less" but still butchers james.

    SELECT REGEXP_REPLACE("James likes to chase birds","([^s])s\\b", "\\1" )

The output I am getting is "Jame like to chase bird" but what I am expecting is "James like to chase bird"

Update: I tried to use negative lookahead, but unfortunately, Bigquery Regex(RE2) doesn't support this.

CodePudding user response:

You can use an alternation here, a regex with two alternatives. In the first alternative, you can capture all the words that are exceptions, into Group 1, and in the second one, use your regex. The replacement will be both group values concatenated:

(?i)\b(less|james|this)\b|([^s])s\b

Replace with \1\2. See the regex demo. Details:

  • (?i) - a case insensitive modifier
  • \b(less|james|this)\b - Group 1: less, james, or this as a whole word
  • | - or
  • ([^s]) - Group 1: any char other than s (NOTE: if you want to only match a letter other than s, you can use [^\W\d_s])
  • s\b - s at the end of a word.

In your code, use

SELECT REGEXP_REPLACE("James likes to chase birds", r"(?i)\b(less|james|this)\b|([^s])s\b", r"\1\2" )
  •  Tags:  
  • Related