Home > Software engineering >  BigQuery - Alternative method to Positive Lookahead for RegExes
BigQuery - Alternative method to Positive Lookahead for RegExes

Time:05-05

I've written a RegEx pattern that identifies alpha-characters that are immediately followed by a numeric character, with the intention that it would used in BigQuery's REGEXP_EXTRACT function.

Here's the pattern: ([A-Z]|[a-z])*(?=[0-9])

However, due to BigQuery's use of RE2 expression library, the Positive Lookahead function does not work. What's an alternative method of identifying the numeric character without including it in the extracted string/match?

Use case:

To extract the first 1 or 2 alpha-characters of a UK postcode, e.g.

  • NW9 9KL
  • M1 0TE
  • ph3 2ee
  • N10 10KE

CodePudding user response:

You can use

REGEXP_EXTRACT(col, '^[A-Za-z] ')

The ^[A-Za-z] regex matches

  • ^ - start of string
  • [A-Za-z] - one or more letters.

Also, if you MUST check for a digit right after the initial letters, you can use a

REGEXP_EXTRACT(col, '^([A-Za-z] )[0-9]')

The ^([A-Za-z] )[0-9] regex matches and captures into Group 1 the initial letters, and then just matches a digit (with [0-9]). The REGEXP_EXTRACT function returns the captured substring if there is a capturing group.

  • Related