I've written a RegEx pattern that identifies alpha-characters that are immediately followed by a numeric character, with the intention that it would used in BigQuery's REGEXP_EXTRACT
function.
Here's the pattern: ([A-Z]|[a-z])*(?=[0-9])
However, due to BigQuery's use of RE2 expression library, the Positive Lookahead function does not work. What's an alternative method of identifying the numeric character without including it in the extracted string/match?
Use case:
To extract the first 1 or 2 alpha-characters of a UK postcode, e.g.
NW
9 9KLM
1 0TEph
3 2eeN
10 10KE
CodePudding user response:
You can use
REGEXP_EXTRACT(col, '^[A-Za-z] ')
The ^[A-Za-z]
regex matches
^
- start of string[A-Za-z]
- one or more letters.
Also, if you MUST check for a digit right after the initial letters, you can use a
REGEXP_EXTRACT(col, '^([A-Za-z] )[0-9]')
The ^([A-Za-z] )[0-9]
regex matches and captures into Group 1 the initial letters, and then just matches a digit (with [0-9]
). The REGEXP_EXTRACT
function returns the captured substring if there is a capturing group.