Home > OS >  AWS Lambda read from SQS without concurrency
AWS Lambda read from SQS without concurrency

Time:01-03

My requirement is like this.

  1. Read from a SQS every 2 hours, take all the messages available and then process it.
  2. Processing includes creating a file with details from SQS messages and sending it to an sftp server.

I implemented a AWS Lambda to achieve point 1. I have a Lambda which has an sqs trigger. I have set batch size as 50 and then batch window as 2 hours. My assumption was that Lambda will get triggered every 2 hours and 50 messages will be delivered to the lambda function in one go and I will create a file for every 50 records.

But I observed that my lambda function is triggered with varied number of messages(sometimes 50 sometimes 20, sometimes 5 etc) even though I have configured batch size as 50.
After reading some documentation I got to know(I am not sure) that there are 5 long polling connections which lambda spawns to read from SQS and this is causing this behaviour of lambda function being triggered with varied number of messages.

My question is

  1. Is my assumption on 5 parallel connections being established correct? If yes, is there a way I can control it? I want this to happen in a single thread / connection
  2. If 1 is not possible, what other alternative do I have here. I do not want to have one file created for every few records. I want one file to be generated every two hours with all the messages in sqs.

CodePudding user response:

  1. You could try to set the ReservedConcurrency of the function to 1. That may help. See the docs for reference.

  2. A simple solution would be to create a CloudWatch Event Trigger (similar to a Cronjob) that triggers your Lambda function every two hours. In the Lambda function, you call ReceiveMessage on the Queue until you get all messages, process them and afterward delete them from the Queue. The drawback is that there may be too many messages to process within 15 minutes so that's something you'd have to manage.

CodePudding user response:

A "SQS Trigger" for Lambda is implemented with the so-called Event Source Mapping integration, which polls, batches and deletes messages from the queue on your behalf. It's designed for continuous polling, although you can disable it. You can set a maximum batch size of up to 10,000 records a function receives (BatchSize) and a maximum of 300s long polling time (MaximumBatchingWindowInSeconds). That doesn't meet your once-every-two-hours requirement.

Two alternatives:

  1. Remove the Event Source Mapping. Instead, trigger the Lambda every two hours on a schedule with an EventBridge rule. Your Lambda is responsible for the SQS ReceiveMessage and DeleteMessageBatch operations. This approach ensures your Lambda will be invoked only once per cron event.
  2. Keep the Event Source Mapping. Process messages as they arrive, accumulating the partial results in S3. Once every two hours, run a second, EventBridge-triggered Lambda, which bundles the partial results from S3 and sends them to the SFTP server. You don't control the number of Lambda invocations.

Note on scaling:

With the SQS Event Source Mapping integration you can tweak the batch settings, but ultimately the Lambda service is in charge of Lambda scaling. As the AWS Blog Understanding how AWS Lambda scales with Amazon SQS standard queues says:

Lambda consumes messages in batches, starting at five concurrent batches with five functions at a time. If there are more messages in the queue, Lambda adds up to 60 functions per minute, up to 1,000 functions, to consume those messages.

You could theoretically restrict the number of concurrent Lambda executions with reserved concurrency, but you would risk dropped messages due to throttling errors.

  • Related