Home > front end >  Any substitution for the negative lookahead in regular expression?
Any substitution for the negative lookahead in regular expression?

Time:03-09

I'm using regular expression to extract some country data in BigQuery. And I don't know how to extract the text I want from it. This is the example records I use.

country
China Anhui Univ Chinese Med, Affiliated Hosp 1, Expt Ctr Clin Res, Sci Res Dept, 117 Meishan Rd, Hefei 230031, Anhui, 12, Peoples R China
Meluna Res, Geldermalsen, Netherlands; [Wiegant, Frederik Anton Clemens] Univ Utrecht, Utrecht, Netherlands

I wanted to extract the last comma-followed words Peoples R China, Netherlands from the text, so I used the negative lookahead to extract them.

(, )(?!.*\b\1\b)((\w*\s?){3})

But it seems like BigQuery doesn't support lookahead expressions since they only support RE2. Is there any way I can extract the country name without using lookahead expressions?

CodePudding user response:

You can use

,\s*([^,]*)$

See the regex demo. The pattern matches

  • , - a comma
  • \s* - zero or more whitespaces
  • ([^,]*) - capturing group 1: any zero or more chars other than a comma
  • $ - end of string.
  • Related