Home > Software design >  Query to multiple csv fles at S3 through Athena
Query to multiple csv fles at S3 through Athena

Time:09-28

I exported my SQL DB into S3 in csv format. Each table is exported into separate csv files and saved in Amazon S3. Now, can I send any query to that S3 bucket which can join multiple tables (multiple csv files in S3) and get a result-set? How can I do that and save in a separate csv file?

CodePudding user response:

The steps are:

  • Put all files related to one table into a separate folder (directory path) in the S3 bucket. Do not mix files from multiple tables in the same folder because Amazon Athena will assume they all belong to one table.
  • Use the CREATE TABLE to define a new table in Amazon Athena, and specify where the files are kept via the LOCATION 's3://bucket_name/[folder]/' parameter. This tells Athena which folder to use when reading the data.

Or, instead of using CREATE TABLE, an easier way is:

  • Go to the AWS Glue management console
  • Select Create crawler
  • Select Add a data source provide the location in S3 where the data is stored
  • Provide other information as prompted (you'll figure it out)

Then, run the crawler and AWS Glue will look at the data files in the specified folder and will automatically create a table for that data. The table will appear in the Amazon Athena console.

Once you have created the tables, you can use normal SQL to query and join the tables.

  • Related