Issues with large MySQL XtraBackup stream to S3-CodePudding

We have a bespoke database backup solution that will be causing us problems in the near future. I'll explain. S3 has a single file limit of ~5TB. Our backup solution utilizes xtrabackup with the xbstream option which is then piped into an 'aws s3 cp' command to store it in S3. The interesting part of the script looks like this:

innobackupex {0} --host={4} --slave-info --stream=xbstream /tmp {5} | lzop | /usr/local/bin/aws s3 cp - {1}/{2} --expected-size {6} --storage-class STANDARD

Ignore the variables, they're injected in from a different part of the script. The key thing to notice is that the xtrabackup xbstream output is piped-into lzop to compress it, then piped-into the "aws s3 cp" command to store it in our bucket as a multi-part upload.

We use the streaming paradigm to avoid local storage costs. By now you can probably guess the issue by now. Our compressed backup size is rapidly approaching the 5TB limit, which will be catastrophic when we reach it.

So here's my plea. Is there a way that we can write compressed backups larger than 5TB to S3 from XtraBackup's xbstream to S3 without compromising our preference to not store anything locally?

I'm not too well versed in bash pipelines. Is there a way to "chunk" the stream so that it writes a new file/stream every X bytes? Or any other sane options? We'd prefer to avoid writing the raw backup, as it's 4x the size of the compressed backup.

Thanks!

CodePudding user response：

Regarding your specific question of:

Is there a way to "chunk" the stream so that it writes a new file/stream every X bytes?

Your best bet for adapting your current workflow to "chunk" your backup file would be to use GNU Parallel, using --pipe (and --block to specify the size each block of data to pipe through to each instance of aws).