Home > database >  Processing specific AWS SQS messages in specific threads
Processing specific AWS SQS messages in specific threads

Time:03-18

The Java application posts async jobs to AWS and gets back a JobID. When the async job is finished, a message will appear in an SQS queue with that JobID. Each JobID is handled by a different thread. Each of those threads also polls SQS for messages until it finds the message which contains its JobID. Additionally, the application is distributed into multiple services so there can't be a single SQS processor.

I saw that SQS returns a maximum of 10 messages and after they are returned, a visibility timeout is applied so that they are not re-sent to other consumers. However, my consumers are the threads that want to consume only a single message and let the rest be consumed by other threads. Should I set the visibility timeout to 0? Will this make it so all consumers get the same set of 10 messages on every request? What's the best way for each consumer to sift through all the messages and find the one it wants?

TL;DR: SQS has 100 messages and there are 100 consumers, one for each message. How should I go about having each consumer find the message it wants (based on a JobID).

EDIT: I know that this is not an appropriate usage of SQS and I'd be very glad to not use it at all but our main integration is with Amazon Textract for which it is mandatory to use SQS for its asynchronous operations. Each Textract request is processed by a different thread which means that they each need to get back a specific SQS message, consumers are not universal. Not to mention the possibility of a clustered environment for which I'd like to avoid having to do any synchronization...

CodePudding user response:

This is not an appropriate architecture for using Amazon SQS. Your processes should not be trying to find a specific message from an Amazon SQS queue.

You should find a different architecture for this message-passing task. Some ideas:

  • Create a message in Amazon S3 with an 'expected' Key. Have the each thread look for that object as a return message. (This is effectively using Amazon S3 as a Key-Value Store.)
  • Have a single Lambda function retrieve messages from SQS and update a database (or S3 as above). Then, have the threads consult the database instead of SQS.

CodePudding user response:

I think you need to put something in between SQS and your threads. Like a DynamoDB table. You could have a Lambda function that processes all the SQS messages and just translates them into DynamoDB records. Then your different threads could easily check for the specific records they are interested in using a DynamoDB query.

Just because Textract mandates that you use SQS doesn't mean the final step in your architecture has to read those messages directly from SQS. In this case SQS is just a message bus that can integrate with other services in AWS, and those services are your building blocks you can use to create the architecture you need.

  • Related