Home > other >  How to copy data streaming between 2 aws profiles
How to copy data streaming between 2 aws profiles

Time:02-05

I am trying to copy 20Tb of data between 2 S3 buckets in 2 different aws accounts, tried this command and it seems that it's getting only the folder or the file inside, and i want to copy the whole bucket data.

aws --profile <dev> s3 cp s3://bucket/path - | aws --profile <prod> s3 cp - s3://bucket/path

CodePudding user response:

20 TB of data is rather big and can take a LONG TIME to download and upload. (Even copying 1Tb of data between folders on your own computer takes a long time, but doing it across the Internet can take a ridiculously long time!)

As an example, at a 1Gbps transfer speed, 20TB of data would take 44 hours to transfer. And since you are wanting to download & upload, it will take twice that time. (They might well be 'testing' you!)

Instead, it would make much more sense to use the capabilities of Amazon S3 to copy files directly between buckets without downloading & uploading the content. You can do this with the aws s3 sync command.

You mention that the source and destination are in two different AWS Accounts. There are therefore two ways you can perform the copy:

'Pull' method

This is where you use credentials from the destination account and 'pull' the objects from the source account:

  • The owner of the source bucket would need to add a bucket policy that permits Read access for the IAM credentials being used

'Push' method

This is where you use credentials from the source account and 'push' the data to the destination account:

  • The owner of the destination bucket would need to add a bucket policy that permits Write access for the IAM credentials being used
  • When copying the objects, specify --acl bucket-owner-full-control so that object ownership is transferred to the destination account (Only required when writing to a different AWS Account)

Copying 20TB of data will still take some time, but the AWS CLI will send the Copy commands in parallel, greatly speeding the process. All data transfer happens within Amazon S3, without downloading any data to your computer.

CodePudding user response:

Another way to possibly do it is to mount the s3 bucket directly on the instance and copy the files from there, work for a small amount of data, but for 20tb it’s not optimal

  •  Tags:  
  • Related