Home > Blockchain >  How to download millions of S3 files and compress them on the fly?
How to download millions of S3 files and compress them on the fly?

Time:03-01

I have an S3 bucket with millions of files, and I want to download all of them. Since I don't have enough storage, I would like to download them, compress them on the fly and only then save them. How do I do this?

To illustrate what I mean:

aws s3 cp --recursive s3://bucket | gzip > file

CodePudding user response:

If you want to compress them all into a single file, as your question seems to indicate, you can add a - to the end of the CLI command to make it write to StdOut:

aws s3 cp --recursive s3://bucket - | gzip > file

If you want to compress them as individual files, then you'll need to first get a listing of all the files, then iterate through them and download/compress one at a time.

But you'll probably find it faster as well as cheaper to spin up an public EC2 instance in the same region with enough disk space to hold the uncompressed files, download them all at once, and compress them there (data going from S3 to EC2 is free as long as it doesn't go through a NAT or cross regions). You can then download the compressed files from S3 and shut down the instance.

  • Related