In file uploads, is it common practice to upload the file to the server first, before making the ser-CodePudding

I'm building a MERN stack application where only authenticated users should be able to upload media files and also then perform basic read and delete operations on them.

My application was previously using Firebase Storage to upload the media to Google's servers directly from the client. However, now that I need the client to be authenticated to perform an upload, I am looking for a secure alternative solution.

From my limited research, it appears that the common approach is to first upload the file to the server and then make a separate request from the server to upload the file to cloud storage (e.g. Google Cloud, AWS etc.) or the database (GridFs in MongoDB)?

It seems inefficient to me to, in effect, upload the file twice. I imagine this would be particularly taxing for large files e.g. a 150 MB video.

For this reason, what is the optimal way of achieving authenticated (large) file uploads? And secondly, how can I go about issuing upload progress to cloud storage or database back to the client?

CodePudding user response：

You will nearly always have the client upload to your server directly because that is the only way you can control access to the cloud storage and the only way you can do it without exposing your cloud credentials to the client which would be a giant security hole and the only way you can control what your clients do and don't upload. Exposing those credentials to the client would allow any client to upload anything they want to your cloud service which is certainly not what you want. You must be able to control it by going through your own server.

And, you should have your server checking auth on the uploader, checking the type of data being uploaded, checking the size of the data being uploaded, etc...

It is possible to pipe the incoming upload to the cloud storage as each packet arrives so that you don't have to first buffer the entire file on your server before you start sending it to the cloud service and Abdennour's answer shows an example of that to the S3 service. You will, of course, have to be very careful about denial of service attacks (like 100TB uploads) in these scenarios so you don't mistakenly allow massive uploads to your cloud storage that are beyond what you intend to allow.

It seems inefficient to me to, in effect, upload the file twice.

Yes, it is slightly inefficient from the bandwidth point of view, but unless your cloud storage offers some one-time credential you can pass to the client (so that credential can only be used for one upload) and unless the cloud storage also allows you to specify ALL the required details to control the upload (max size, file type, etc...) and cloud storage will enforce that, then the only other place to put that logic is in your server which requires the upload to go through your server which is the common way this is implemented.

CodePudding user response：

You can directly pipe the request to Cloud Storage (AWS S3) without the need to cache it in the server.

And this is how it should look like :

import express from 'express';

import {S3} from 'aws-sdk';

const s3 = new S3();
const app = express();

app.post('/data', async (req, res) => {
    var params = {
        Body: req,
        Bucket: "yourbucketname here",
        Key: "exampleobject",
    };
    s3.upload(params, async (err, data) => {
        if(err)
            log.info(err);
        else
            log.info("success");
    });
});
const server = app.listen(8081,async () => log.info("App listening") )

The key thing here is that your request should have multi-part enabled.