Is there a way to get the list of all tables with the last refresh date from a database in the Cloudera Hadoop impala?
I'm trying to write a custom SQL query that can do that so I can use it to build a dashboard (in Tableau) where we can track if a table is refreshed or not. So we can take action accordingly. I tried it using a join but there are so many tables and I believe there is a better way to do it. (Database name Core_research and there are more than 500 tables)
CodePudding user response:
I used to run a script that refreshed column stats on tables every Sunday. We couldn't run all the tables but we did as many as time permitted. You could do the same but actually record when the script ran in database/table. This would give you the functionality you are looking for.
Another other option would be to create a table out of the Impala logs and keep track of things that way. (With some fancy regex to track refreshes)