Home > Mobile >  AWS Athena tables automatically appear in AWS Glue console
AWS Athena tables automatically appear in AWS Glue console

Time:06-18

I recently found out that there's a restriction on the number of partitions that AWS Athena table may have (20000 at the moment, mentioned here: https://docs.aws.amazon.com/athena/latest/ug/partitions.html).

The same page mentions that AWS Glue tables may have 10 million partitions, so I opened my AWS Glue console to recreate the tables that I had been using in Athena so far, and was surprised to see all the tables that I created in Athena console being listed in AWS Glue console as well.

Hence a question, does that mean every table created in Athena console is going to be an AWS Glue table and is going to support 10 million partitions?

I am currently using Athena SDK for Java (https://docs.aws.amazon.com/athena/latest/ug/code-samples.html) to select and load data from table t1 into table t2 using INSERT INTO queries which dynamically generate partitions in Hive format (i.e. col1=<...>/col2=<...>/...). Can I still use it? Is there any other SDK specifically for Glue tables? My current concern is table t2: it's going to reach 20000 partitions limit quite soon so I'm wondering if I still need to worry about that or not?

And in case if the fact of being listed in AWS Glue console does not yet imply supporting 10M partitions, then how to make existing Athena table support 10M partitions? Should the table be created in AWS Glue console using "Add table" in order to have 10M partition support?

CodePudding user response:

Yes and no. If you are using the Glue data catalog to query Athena (by default, you are), then Athena supports querying tables with 10m partitions. However, it can only actually use 1m of those partitions at a time. source

  • Related