Home > Mobile >  Big Query Regex: Match all from a set of words
Big Query Regex: Match all from a set of words

Time:05-06

I am using REGEXP_EXTRACT_ALL(column_with_text, r'(\bword1\b|\bword2\b|\bword3\b)') as matching_words for extracting any of the 3 words anywhere in the text, regardless of the order and case.

For the following text:
here random word here word2 or here word1 random words it extracts word1 and word2 which is nice.

However, with another query, I want to extract those words if all of those three words appear in the text, again regardless of order and case. I am trying like: REGEXP_EXTRACT_ALL(column_with_text, r'(\bword1\b)(\bword2\b)(\bword3\b)').

For the following text:
here random word here word2 or here word1 random words it should return nothing since the text does not contain word3. But it should return all the three words if the text would be: here word3 for example random word here word2 or here word1 random words

This throws the following error in Big Query:

Regular expressions passed into extraction functions must not have more than 1 capturing group

CodePudding user response:

You can write your own function for this - for example

create temp function regexp_extract_only_if_all(text string, regexp string) as ((
  select if(array_length(arr) = cnt, arr, null) from 
  (select regexp_extract_all(text, r'' || regexp) arr, array_length(regexp_extract_all(regexp, r'\|'))   1 cnt)
));
with your_table as (
  select 'here random word here word2 or here word1 random words' column_with_text union all
  select 'here word3 for example random word here word2 or here word1 random words'
)
select column_with_text, 
  regexp_extract_only_if_all(column_with_text, r'\bword1\b|\bword2\b|\bword3\b') as extracts
from your_table    

with output

enter image description here

  • Related