copy Agent1
from 's3://my-bucket/Reports/Historical Metrics Report (1).csv'
iam_role 'arn:aws:iam::my-role:role/RedshiftRoleForS3'
csv
null as '\000'
IGNOREHEADER 1;
I am using this (above) to pull the data from s3 to redshift table. its working fine but there is one problem as when data is pulled/copied very first time it inserted into table but when the data get updated in s3 bucket file and we run the same query what it does is add the whole new rows of data instead of overwriting the already created rows.
How to stop duplication? i just want that when the data get updated on s3 file, after running Copy Command my data (rows) get overwritten and replaced the rows data with new data.
Here is the screenshot - rows are being added with updated data
CodePudding user response:
Redshift doesn't enforce uniqueness. See - https://docs.aws.amazon.com/redshift/latest/dg/t_Defining_constraints.html
To update a table you will need to COPY the data to a "staging" table and perform an UPSERT process. See - https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-upsert.html
CodePudding user response:
If your goal is to empty the table prior to loading data via COPY
, then you can use the TRUNCATE
command to empty the table:
TRUNCATE tablename;
Note that this cannot be rolled-back within a commit.
Using TRUNCATE
is much more efficient than DELETE FROM tablename
because it immediately removes all storage associated with the table.