I am trying to get a large file (>10gb) on s3 (stored as csv on s3) and send it as a csv in the response header. I am doing it by using the following procedure:
async getS3Object(params:any) {
s3.getObject(params, function (err, data) {
if (err) {
console.log('Error Fetching File');
}
else {
const csv = data.Body.toString('utf-8');
res.setHeader('Content-disposition', `attachment; filename=${fileId}.csv`);
res.set('Content-Type', 'text/csv');
res.status(200).send(csv);
}
});
This is taking painfully long to process the file and send it as a csv attachments. How can I make this faster?
CodePudding user response:
There are a lot of factors at play here, and this is a very large file.
What is "painfully long"?
Where are you transferring from/to (i.e. is your code in a Lambda function, or are you retrieving the file to a non-cloud location)?
What is the speed of the internet connection on the receiving end, and based on that, what would a best case transfer time be?
If bandwidth or some other "local" resource constraint is not the problem, have you considered enabling S3 Transfer Acceleration? This could speed up transfer times by 50-500%
https://aws.amazon.com/s3/transfer-acceleration/
CodePudding user response:
You're dealing with a huge file; you could break that into chunks using range (see also the docs, search for "calling the getobject property"). If you need the whole file, you could split the work off into workers, though at some point the limit will probably be your connection, and if you need to send the whole file as an attachment that won't help much.
A better solution would be to never download the file in the first place. You can do this by streaming from S3 (see also this, and this), or setting up a proxy in your server so the bucket/subdir seems to the client to be in your app.