I am writing a SQL query and I have n regular expressions to match in the query.
SELECT * FROM
Sample_Table
WHERE
(REGEXP_CONTAINS(content, r"[a-z0-9A-Z]{40}") OR -- expression 1
REGEXP_CONTAINS(content, r"[0-9a-z]{32}") OR -- expression 2
..................
REGEXP_CONTAINS(content, r"[a-z0-9]{80}") OR -- expression n-1
REGEXP_CONTAINS(content, r"[a-z0-9A-Z\%]{35}")) -- expression n
Now, when I get back the result, I want to know the particular returned row is matched by expression 1 or expression 2. One option would have been to run each regular expression once and tag the result. But, I have to run all the regular expression in one query because of resource limitation in Google BigQuery.
Is there any way to tag each returned result row with the matched regular expression?
CodePudding user response:
Consider below approach
with patterns as (
select 1 pattern_id, r"[a-z0-9A-Z]{40}" pattern union all
select 2, r"[0-9a-z]{32}" union all
select 3, r"[a-z0-9]{80}" union all
select 4, r"[a-z0-9A-Z\%]{35}"
)
select any_value(t).*, string_agg('' || pattern_id) matches
from Sample_Table t, patterns p
where regexp_contains(content, pattern)
group by format('%t', t)
if you have better column to use for group by
you should use it - for example
select any_value(t.sample_repo_name), string_agg('' || pattern_id) as matches
from `bigquery-public-data.github_repos.sample_contents` t, patterns p
where regexp_contains(content, pattern)
group by id