Home > Enterprise >  Extract/Get the table name and their respective columns used in a view from a query in BigQuery
Extract/Get the table name and their respective columns used in a view from a query in BigQuery

Time:09-24

Let us take the following MYSQL query written in BigQuery for creating a view.

SELECT a.col1,
       a.col2,
       a.col3,
       b.col1,
       b.col2,
       b.col3,
       c.col1,
       c.col2
FROM project_name_dataset_table_a a
INNER JOIN project_name_dataset_table_b b ON a.col1 = b.col1
INNER JOIN project_name_dataset_table_c c ON a.col2 = c.col2

Below is the required output format or similar. Final Format Required or any similar will be fine

Basically i have to prepare a document which will contain all the information of the views created and the tables and their respective columns used in it.

CodePudding user response:

I have created a script for the usage at my end you can tweak the regex according to you

import re

query = """
SELECT a.col1, a.col2, a.col3, b.col1, b.col2, b.col3, c.col1, c.col2 
FROM `project_name.dataset.table_a` a
JOIN `project_name.dataset.table_b` b
ON ...
JOIN `project_name.dataset.table_c` c
ON ...
"""
tabsandcols = re.findall(r'([A-Za-z0-9\-\_]*)\.([a-zA-Z0-9\.\_]*)', query)

alias_names = re.findall(r'\`([A-Za-z\-\_]*)\.([a-zA-Z\.\_]*)\` ([a-z]{1,3})', query)

dic = {}

print(alias_names)

for i, j, k in alias_names:
    if k in dic:
        pass
    else:
        dic[k] = j

l = set()
for i, j in tabsandcols:
    if i != "project_name" and dic.get(i, "") != "":
        l.add(dic.get(i, "")   " "   j)

for i in l:
    print("project_name", i)

It will provide following output

project_name dataset.table_a col1
project_name dataset.table_b col2
project_name dataset.table_c col1
project_name dataset.table_c col2
project_name dataset.table_a col3
project_name dataset.table_a col2
project_name dataset.table_b col1
project_name dataset.table_b col3

CodePudding user response:

I could replicate the code. I suggest that you use this command from BigQuery (INFORMATION_SCHEMA.COLUMNS). This command brings you the information that you need.

This is a sample code:

SELECT A.table_catalog, CONCAT(A.table_schema, '.' , A.table_name) as table,A.column_name
FROM myproject_name.mydataset_name.INFORMATION_SCHEMA.COLUMNS A
where A.table_catalog=myproject_name

If you need more information about this command you can click here:

  • Related