I am using REGEXP_EXTRACT_ALL(column_with_text, r'(\bword1\b|\bword2\b|\bword3\b)') as matching_words
for extracting any of the 3 words anywhere in the text, regardless of the order and case.
For the following text:
here random word here word2 or here word1 random words
it extracts word1 and word2 which is nice.
However, with another query, I want to extract those words if all of those three words appear in the text, again regardless of order and case. I am trying like:
REGEXP_EXTRACT_ALL(column_with_text, r'(\bword1\b)(\bword2\b)(\bword3\b)')
.
For the following text:
here random word here word2 or here word1 random words
it should return nothing since the text does not contain word3. But it should return all the three words if the text would be: here word3 for example random word here word2 or here word1 random words
This throws the following error in Big Query:
Regular expressions passed into extraction functions must not have more than 1 capturing group
CodePudding user response:
You can write your own function for this - for example
create temp function regexp_extract_only_if_all(text string, regexp string) as ((
select if(array_length(arr) = cnt, arr, null) from
(select regexp_extract_all(text, r'' || regexp) arr, array_length(regexp_extract_all(regexp, r'\|')) 1 cnt)
));
with your_table as (
select 'here random word here word2 or here word1 random words' column_with_text union all
select 'here word3 for example random word here word2 or here word1 random words'
)
select column_with_text,
regexp_extract_only_if_all(column_with_text, r'\bword1\b|\bword2\b|\bword3\b') as extracts
from your_table
with output