I am doing analysis on table usage. I need to use the query history table and take out all tables that were used in a query. The query itself could use 10 separate tables, or even use the same table multiple times. I need to be able to parse through the query (using a query itself, not python or any other language, just SQL) and take out a list of all tables that the query hit.
Is there any way to do this?
An example of where I am getting the histories would be this query:
select query_text
from table(information_schema.query_history())
CodePudding user response:
The ACCESS_HISTORY
view will tell you what tables are used in a query:
https://docs.snowflake.com/en/sql-reference/account-usage/access_history.html
That's an enterprise level feature. You could also run EXPLAIN
on any query:
SELECT *, "objects"
FROM TABLE(EXPLAIN_JSON(SYSTEM$EXPLAIN_PLAN_JSON('SELECT * FROM a.b.any_table_or_view')))
WHERE "operation"='TableScan'
See more at https://stackoverflow.com/a/64343564/132438
CodePudding user response:
An alternative approach using rlike and information_schema.tables.
You could extend this further by looking at the # rows per table (high = fact, low = dimension) and the number of times accessed.
select query_text, array_agg(DISTINCT TABLE_NAME::string)
from
(select top 100 query_text
from
table(information_schema.query_history())
where
EXECUTION_STATUS = 'SUCCESS' ) a
left outer join
(select TABLE_NAME from INFORMATION_SCHEMA.TABLES group by TABLE_NAME) b
on
upper(a.query_text) rlike '.*('||upper(b.table_name)||').*'
group by
query_text