Duplicate rows instead of overwriting rows data using COPY command to load data from Amazon S3 to re-CodePudding

copy Agent1
from 's3://my-bucket/Reports/Historical Metrics Report (1).csv' 
iam_role 'arn:aws:iam::my-role:role/RedshiftRoleForS3'
csv
null as '\000'
IGNOREHEADER 1;

I am using this (above) to pull the data from s3 to redshift table. its working fine but there is one problem as when data is pulled/copied very first time it inserted into table but when the data get updated in s3 bucket file and we run the same query what it does is add the whole new rows of data instead of overwriting the already created rows.

How to stop duplication? i just want that when the data get updated on s3 file, after running Copy Command my data (rows) get overwritten and replaced the rows data with new data.

Here is the screenshot - rows are being added with updated data

CodePudding user response：

Redshift doesn't enforce uniqueness. See - https://docs.aws.amazon.com/redshift/latest/dg/t_Defining_constraints.html

To update a table you will need to COPY the data to a "staging" table and perform an UPSERT process. See - https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-upsert.html

CodePudding user response：

If your goal is to empty the table prior to loading data via COPY, then you can use the TRUNCATE command to empty the table:

TRUNCATE tablename;

Note that this cannot be rolled-back within a commit.

Using TRUNCATE is much more efficient than DELETE FROM tablename because it immediately removes all storage associated with the table.