Home > Net >  How to know which regular expression pattern is matched in SQL Query
How to know which regular expression pattern is matched in SQL Query

Time:07-25

I am writing a SQL query and I have n regular expressions to match in the query.

SELECT * FROM
Sample_Table
WHERE
(REGEXP_CONTAINS(content, r"[a-z0-9A-Z]{40}") OR -- expression 1
REGEXP_CONTAINS(content, r"[0-9a-z]{32}") OR -- expression 2
..................
REGEXP_CONTAINS(content, r"[a-z0-9]{80}") OR -- expression n-1
REGEXP_CONTAINS(content, r"[a-z0-9A-Z\%]{35}")) -- expression n 

Now, when I get back the result, I want to know the particular returned row is matched by expression 1 or expression 2. One option would have been to run each regular expression once and tag the result. But, I have to run all the regular expression in one query because of resource limitation in Google BigQuery.

Is there any way to tag each returned result row with the matched regular expression?

CodePudding user response:

Consider below approach

with patterns as (
  select 1 pattern_id, r"[a-z0-9A-Z]{40}" pattern union all
  select 2, r"[0-9a-z]{32}" union all
  select 3, r"[a-z0-9]{80}" union all
  select 4, r"[a-z0-9A-Z\%]{35}"
)
select any_value(t).*, string_agg('' || pattern_id) matches  
from Sample_Table t, patterns p
where regexp_contains(content, pattern)
group by format('%t', t)     

if you have better column to use for group by you should use it - for example

select any_value(t.sample_repo_name), string_agg('' || pattern_id) as matches  
from `bigquery-public-data.github_repos.sample_contents` t, patterns p
where regexp_contains(content, pattern)
group by id
  • Related