I exported my SQL DB into S3 in csv format. Each table is exported into separate csv files and saved in Amazon S3. Now, can I send any query to that S3 bucket which can join multiple tables (multiple csv files in S3) and get a result-set? How can I do that and save in a separate csv file?
CodePudding user response:
The steps are:
- Put all files related to one table into a separate folder (directory path) in the S3 bucket. Do not mix files from multiple tables in the same folder because Amazon Athena will assume they all belong to one table.
- Use the CREATE TABLE to define a new table in Amazon Athena, and specify where the files are kept via the
LOCATION 's3://bucket_name/[folder]/'
parameter. This tells Athena which folder to use when reading the data.
Or, instead of using CREATE TABLE
, an easier way is:
- Go to the AWS Glue management console
- Select Create crawler
- Select Add a data source provide the location in S3 where the data is stored
- Provide other information as prompted (you'll figure it out)
Then, run the crawler and AWS Glue will look at the data files in the specified folder and will automatically create a table for that data. The table will appear in the Amazon Athena console.
Once you have created the tables, you can use normal SQL to query and join the tables.