I need to load data from Aurora (MySQL) to Redshift and using S3 is one of the available options. I can extract data from Aurora (MySQL) to S3 using and load them to Redshift using 2 statements:
SELECT *
FROM data_table
INTO OUTFILE S3 's3-XXX://bucket_name/aurora_files/data_table'
FORMAT CSV HEADER
FIELDS TERMINATED BY ';'
LINES TERMINATED BY '\n'
OVERWRITE ON;
copy data_table
from 's3://bucket_name/aurora_files/data_table.part_00000'
access_key_id 'XXX'
secret_access_key 'XXX'
csv
delimiter ';'
ignoreheader 1
timeformat 'YYYY-MM-DD HH:MI:SS'
region 'XXX';
This works wilt single file explicitly mentioned when loading to Redshift, but if I try loading multiple files using manifest, I get the following error:
[2022-03-15 14:13:23] [XX000] ERROR: S3 path "s3-XXX://bucket_name/aurora_files/data_table.part_00000" has invalid format.
[2022-03-15 14:13:23] Detail:
[2022-03-15 14:13:23] -----------------------------------------------
[2022-03-15 14:13:23] error: S3 path "s3-XXX://bucket_name/aurora_files/data_table.part_00000" has invalid format.
[2022-03-15 14:13:23] code: 8001
[2022-03-15 14:13:23] context: Parsing S3 Bucket
[2022-03-15 14:13:23] query: 326558
[2022-03-15 14:13:23] location: s3_utility.cpp:133
[2022-03-15 14:13:23] process: padbmaster [pid=22440]
[2022-03-15 14:13:23] -----------------------------------------------
Statement that are returning errors:
SELECT *
FROM data_table
INTO OUTFILE S3 's3-XXX://bucket_name/aurora_files/data_table'
FORMAT CSV HEADER
FIELDS TERMINATED BY ';'
LINES TERMINATED BY '\n'
MANIFEST ON
OVERWRITE ON;
copy data_table
from 's3://bucket_name/aurora_files/data_table.manifest'
access_key_id 'XXX'
secret_access_key 'XXX'
csv
delimiter ';'
ignoreheader 1
timeformat 'YYYY-MM-DD HH:MI:SS'
region 'XXX'
manifest;
What could be the issue?
CodePudding user response:
It looks like the file names in your manifest are not in the correct format.
s3://bucket_name/aurora_files/data_table.part_00000 which works and
s3-eu-west-1://bi-xbo-poc/aurora_files/dwh_customers.part_00000 which doesn't.
I expect you just need to remove the region specifier.
CodePudding user response:
Right. The was in the command that was used to create S3 file. There is not need to specify region
INTO OUTFILE S3 's3-XXX://bucket_name/aurora_files/data_table'.
Keeping just s3:// fixes the manifest and files load.
INTO OUTFILE S3 's3://bucket_name/aurora_files/data_table'.