I'm using regular expression to extract some country data in BigQuery. And I don't know how to extract the text I want from it. This is the example records I use.
country |
---|
China Anhui Univ Chinese Med, Affiliated Hosp 1, Expt Ctr Clin Res, Sci Res Dept, 117 Meishan Rd, Hefei 230031, Anhui, 12, Peoples R China |
Meluna Res, Geldermalsen, Netherlands; [Wiegant, Frederik Anton Clemens] Univ Utrecht, Utrecht, Netherlands |
I wanted to extract the last comma-followed words Peoples R China
, Netherlands
from the text, so I used the negative lookahead to extract them.
(, )(?!.*\b\1\b)((\w*\s?){3})
But it seems like BigQuery doesn't support lookahead expressions since they only support RE2. Is there any way I can extract the country name without using lookahead expressions?
CodePudding user response:
You can use
,\s*([^,]*)$
See the regex demo. The pattern matches
,
- a comma\s*
- zero or more whitespaces([^,]*)
- capturing group 1: any zero or more chars other than a comma$
- end of string.