Home > Net >  Copy a file from s3 to GCP object storage by piping or streaming get from s3 to put in cloud storage
Copy a file from s3 to GCP object storage by piping or streaming get from s3 to put in cloud storage

Time:11-04

Is it possible to make an API call to S3 to fetch a file and then pipe the result into an API call to GCP cloud storage(Object storage) to put an object. This will effectively copy the file from S3 to Azure blob.

My goal is to do this completely in-memory without writing anything to disk and also being able to handle files larger than memory. Is this even possible?

I have tried looking in the python docs for any such option but did not find any. I did come across BytesIO which may be helpful.

CodePudding user response:

Sure, Google Cloud Storage's gsutil can copy between clouds. Once you install it and configure it with your GCP credentials, open up the ".boto" file and add a section with your AWS credentials:

[Credentials]
aws_access_key_id = ACCESS_KEY_ID
aws_secret_access_key = SECRET_ACCESS_KEY

Then run a copy command, like so:

gsutil cp s3://mybucket/myblob gs://mybucket/myblob

If it's a whole lot of files, you may want to run gsutil on a GCE or EC2 instance so that the data doesn't need to stream through your local machine.

Alternately, if you need to do this for very large buckets or on a regular schedule, you could look into the Storage Transfer Service, which can perform large copies from S3: https://cloud.google.com/storage-transfer-service

CodePudding user response:

First, you should consider if you want to develop your own code for this. There are many options to move from AWS S3 to GCP CS.

Also consider that GCP CS can be treated as a S3 bucket with the compatibility API, so any option valid from moving from one S3 bucket to another S3 bucket should work also between AWS S3 and GCP CS.

That said, if you still want to code your own implementation, and you do not want to store the file locally, you could use the multipart download & upload to chain it: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#multipartupload

  • Related