Home > Blockchain >  How to read and process large text/CSV files from an S3 bucket using C#?
How to read and process large text/CSV files from an S3 bucket using C#?

Time:04-18

I am trying to read 15 MB CSV files from the s3 bucket using the following code.

ListObjectsResponse object1 = await S3Client.ListObjectsAsync("mybucket");
foreach (S3Object s3File in object1.S3Objects)
{
    var response = await S3Client.GetObjectAsync("mybucket", s3File.Key);

    var request = new GetObjectRequest()
    {
        BucketName = "mybucket",
        Key = files.Key
    };

    using (var res = S3Client.GetObjectAsync(request))
    {
        StreamReader sReader = new StreamReader(res.Result.ResponseStream); //Time out here
         string? line = sReader.ReadLine();
    }
}

The above code is working fine with smaller files. but if the file has more 100K lines, the lambda function is timing out in the aws console. I want to process all the lines from s3 bucket file.

Could please let me know the best approach to implement this?

CodePudding user response:

Increase your Lambda timeout, which (currently) has a hard limit of 15 minutes.

If your CSV processing takes longer than 15 minutes, Lambda functions are not the right solution for your job - they are meant for quick processing.

What would be the right solution is out of scope but you could perhaps utilise spot EC2 instances, step functions, run containers on Fargate etc.

Related: to speed up your current process, make parallel requests to S3 at the beginning and then process in one go i.e. create the tasks and then await them all at once.

  • Related