Home > Software engineering >  Load Aurora exported csv files from S3 to Redshift
Load Aurora exported csv files from S3 to Redshift

Time:03-16

I need to load data from Aurora (MySQL) to Redshift and using S3 is one of the available options. I can extract data from Aurora (MySQL) to S3 using and load them to Redshift using 2 statements:

SELECT *
FROM data_table
INTO OUTFILE S3 's3-XXX://bucket_name/aurora_files/data_table'
    FORMAT CSV HEADER
    FIELDS TERMINATED BY ';'
    LINES TERMINATED BY '\n'
    OVERWRITE ON;


copy data_table
from 's3://bucket_name/aurora_files/data_table.part_00000'
access_key_id 'XXX'
secret_access_key 'XXX'
csv
delimiter ';'
ignoreheader 1
timeformat 'YYYY-MM-DD HH:MI:SS'
region 'XXX';

This works wilt single file explicitly mentioned when loading to Redshift, but if I try loading multiple files using manifest, I get the following error:

[2022-03-15 14:13:23] [XX000] ERROR: S3 path "s3-XXX://bucket_name/aurora_files/data_table.part_00000" has invalid format. 
[2022-03-15 14:13:23] Detail:  
[2022-03-15 14:13:23] -----------------------------------------------  
[2022-03-15 14:13:23] error:  S3 path "s3-XXX://bucket_name/aurora_files/data_table.part_00000" has invalid format.  
[2022-03-15 14:13:23] code:      8001  
[2022-03-15 14:13:23] context:   Parsing S3 Bucket  
[2022-03-15 14:13:23] query:   326558  
[2022-03-15 14:13:23] location:  s3_utility.cpp:133  
[2022-03-15 14:13:23] process:   padbmaster [pid=22440]  
[2022-03-15 14:13:23] -----------------------------------------------

Statement that are returning errors:

SELECT *
FROM data_table
INTO OUTFILE S3 's3-XXX://bucket_name/aurora_files/data_table'
    FORMAT CSV HEADER
    FIELDS TERMINATED BY ';'
    LINES TERMINATED BY '\n'
    MANIFEST ON
    OVERWRITE ON;

copy data_table
from 's3://bucket_name/aurora_files/data_table.manifest'
access_key_id 'XXX'
secret_access_key 'XXX'
csv
delimiter ';'
ignoreheader 1
timeformat 'YYYY-MM-DD HH:MI:SS'
region 'XXX'
manifest;

What could be the issue?

CodePudding user response:

It looks like the file names in your manifest are not in the correct format.

s3://bucket_name/aurora_files/data_table.part_00000 which works and

s3-eu-west-1://bi-xbo-poc/aurora_files/dwh_customers.part_00000 which doesn't.

I expect you just need to remove the region specifier.

CodePudding user response:

Right. The was in the command that was used to create S3 file. There is not need to specify region

INTO OUTFILE S3 's3-XXX://bucket_name/aurora_files/data_table'.

Keeping just s3:// fixes the manifest and files load.

INTO OUTFILE S3 's3://bucket_name/aurora_files/data_table'.
  • Related