I want to preview in Athena data that resides in an S3 bucket. The data is in parquet. This doc here describes the process of how to use AWS Glue to create a preview. One mandatory step here is to input the Column Details. This include entering the column name and its data type. I have two problems with this step:
1 - What if I have no ideas of what columns exist in the parquet file before hand (i.e. I have not seen the content of the parquet before)?
2 - What if there are hundreds if not thousands of columns in there.
Is there a way to make the this work without entering this Column Details ?
CodePudding user response:
The link you provided answers your first question, I think:
What if I have no ideas of what columns exist in the parquet file before hand
Then you should use a Glue crawler to explore the files and have it create a Glue table for you. That table will show up in the AwsDataCatalog catalog as a queryable relation.
What if there are hundreds if not thousands of columns in there.
If you're worried about some column quota limitation, I spent some time looking around documentation to see if there is any mention of a service quota for max columns per table. I could not find any. That doesn't mean that there isn't one, but I would be surprised to see that someone generated a parquet file with more columns than Glue supports.