Recently I've got the trouble with uploading the large database backup file (~80GB) to s3.
upload failed: - to s3://<> 'Connection aborted.', BrokenPipeError(32, 'Broken pipe')
After trying with option --expected-size 107374182400
(100GB =107374182400 Bytes ) in s3 cp command, I can upload it. So, I modified the cron job script. I suppose that when the database backup is larger than 100GB, I have to modify its value again. How can I optimize that solution?
Cronjob script
mongodump --archive --gzip --authenticationDatabase admin \
--db db -u mongobackup \
2> $LOG_FILE \
| aws s3 cp --storage-class=STANDARD_IA - "s3://$BUCKET/$BACKUP_NAME" --expected-size 107374182400 2> $LOG_FILE_S3
Best Regards,
CodePudding user response:
The script you have shown is outputting data to stdout
and the AWS CLI is copying it to S3 from stdin
(as indicated by the -
source name).
This is coming through as a stream of data of unknown size (whereas it is easy to determine the size of a file on disk).
From aws s3 cp — AWS CLI Command Reference:
--expected-size (string)
This argument specifies the expected size of a stream in terms of bytes. Note that this argument is needed only when a stream is being uploaded to s3 and the size is larger than 50GB. Failure to include this argument under these conditions may result in a failed upload due to too many parts in upload.
Therefore, this value is required to provide a 'hint' as to how large the data will be and, therefore, the size of each 'part' being uploaded.
I suspect that it would be okay to provide a number that is too big (eg twice the size), so I would recommend supplying a number that is definitely bigger than the known size. You would need to increase it in future as your data grows in size. (Therefore, perhaps even try a number that is 10x the necessary size?)