Home > Back-end >  How to get the record count of all tables in athena aws
How to get the record count of all tables in athena aws

Time:12-14

I am looking for a way to find the record count of all the tables (in all table schemas) in my aws Athena. I have tried with following, but it looks like information schema doesn't provide the record count. can someone help.


SELECT t.table_schema, t.table_name, t.table_rows
FROM   "information_schema"."schemata" s
INNER JOIN "information_schema"."tables" t on s.schema_name = t.table_schema
INNER JOIN "information_schema"."columns" c on c.table_name = t.table_name AND c.table_schema = t.table_schema
WHERE c.table_catalog = 'awsdatacatalog'

CodePudding user response:

but it looks like information schema doesn't provide the record count

I would argue for pretty obvious reasons, first of all it is not part of schema information, secondary - from pragmatic performance reasons - to provide record counts Athena/Presto/Trino will need to process all data files/sources.

AFAIK Presto/Trino does not support any kind of procedural query execution (like PL/SQL combined with something allowing to execute SQL from string) so the only option is to build the query via SQL or some other language and execute it. Something to start with:

with tables(full_name) as(
    SELECT '"' || t.table_schema || '"."' || t.table_name || '"' as full_name
    FROM "information_schema"."schema" s
    INNER JOIN "information_schema"."tables" t on s.schema_name = t.table_schema
)

select array_join(array_agg('select ''' || full_name || ''' as table_name, count(*) as rows_count from ' || full_name), ' union all ')
from tables
group by true;

Alternatively you can define custom Athena function via lambda which will dynamically build and execute corresponding sql statement.

CodePudding user response:

You can do this as a two step process. 1. Dynamically building the SQL for getting the counts using the below query. 2. Running the output of the SQL to generate the counts

with tname_vw(i) as (
    SELECT concat(
            'select ''',
            table_name,
            ''' as table_name,  count(*) from ',
            table_name
        )
    FROM information_schema.tables
    WHERE table_schema = 'schema_name'
)
select array_join(array_agg(i), ' union ') as result
from tname_vw
  • Related