I wonder are there any way to get table access history in Redshift cluster?
Our cluster has a lot of tables and it is costing us a lot. I would like to discover what specific tables have not been accessed for a given period and then I would drop those tables.
Are there any ways to get table access history? If someone has opinion or materials please let me know.
Thanks.
CodePudding user response:
The STL_QUERY - Amazon Redshift system table contains execution information about a database query.
You could parse the queries to try to determine which tables have been accessed recently (a little bit tricky since you would need to extract the table names from the queries). You could then compare those table names against SVV_TABLE_INFO - Amazon Redshift to discover which tables have not been accessed lately.
CodePudding user response:
To manage disk space, the STL logs (system tables e.g STL_QUERY, STL_QUERYTEXT, ) only retain approximately two to five days of log history (max 7 days) , depending on log usage and available disk space. If you want to retain the log data, you will need to periodically copy it to other tables or unload it to Amazon S3. If you have not copied/exported the stl logs previously, there is no way to access logs of before 1 week.
-->In your case, you can discover which specific tables have not been accessed, only in last 1 week (assuming you have not exported the logs previously).
I came across a similar situation in past, I would suggest to firstly check that the tables are not referred in any procedure or views in redshift with below query:
SELECT DISTINCT
srcobj.oid AS src_oid
,srcnsp.nspname AS src_schemaname
,srcobj.relname AS src_objectname
,tgtobj.oid AS dependent_viewoid
,tgtnsp.nspname AS dependent_schemaname
,tgtobj.relname AS dependent_objectname
FROM
pg_catalog.pg_class AS srcobj
INNER JOIN
pg_catalog.pg_depend AS srcdep
ON srcobj.oid = srcdep.refobjid
INNER JOIN
pg_catalog.pg_depend AS tgtdep
ON srcdep.objid = tgtdep.objid
JOIN
pg_catalog.pg_class AS tgtobj
ON tgtdep.refobjid = tgtobj.oid
AND srcobj.oid <> tgtobj.oid
LEFT OUTER JOIN
pg_catalog.pg_namespace AS srcnsp
ON srcobj.relnamespace = srcnsp.oid
LEFT OUTER JOIN
pg_catalog.pg_namespace tgtnsp
ON tgtobj.relnamespace = tgtnsp.oid
WHERE tgtdep.deptype = 'i' --dependency_internal
AND tgtobj.relkind = 'v' --i=index, v=view, s=sequence
and src_schemaname <> 'pg_catalog' and src_schemaname <> 'information_schema';
-->Secondly, if time permits start exporting the redshift stl logs to s3 for few weeks to better explore the least accessed tables.
--> If tables are critical and time does not permit , its better to export the data of the tables to s3 and retain it for few days prior dropping the tables from redshift. It would serve as a backup just in case something goes wrong.