I have an input table which has a list of all queries run in Bigquery. I need to create two columns one with the dataset name and other with table name.
The queries example is as follows (some could be nested). I need to separate the dataset name and tablename using the dot in the string
Queries | dataset | table |
---|---|---|
select * from advp.accounts ; | advp | accounts |
select * from gqd.customers where id in (select id from gwq.facts) ; | gqd | customers |
gwq | facts | |
create table hsq.sales (id int64, ss string); | hsq | sales |
CodePudding user response:
You can try using REGEXP_EXTRACT_ALL
to extract all instances of "<database>.<table>", then unnest the corresponding arrays.
WITH cte AS (
SELECT *, REGEXP_EXTRACT_ALL(Queries, r'([a-zA-Z0-9] )\.') AS db_names,
REGEXP_EXTRACT_ALL(Queries, r'(\.[a-zA-Z0-9] )') AS table_names
FROM tab
)
SELECT cte.Queries, dataset, table_
FROM cte, cte.db_names AS dataset, cte.table_names AS table_