Whenever a queue in Amazon SQS “overflows” I don’t really see any way to debug an issue, since AWS does not seem to provide any kind of monitoring for SQS that lets you segment messages in flight by their payloads - so I have no idea which types of messages are causing an overflow.
How can I segment messages in Amazon SQS by name/payload/other attributes on a graph, so I can figure out which types of messages I causing the queue to overflow?
Note: I cannot use a queue-per-message-type approach since I use Elastic Beanstalk and it doesn’t seem to allow having multiple queues per worker (and having a worker per queue is very costly since I usually have lots of message types).
I also log every message type I receive in cloudwatch, but cannot figure out a way to set up a filter/metric that will let me monitor types of messages without having to set up a custom filter for each of them manually.
UPD:
I ended up using Cloudwatch Log Insights. It it not ideal since there is no way to group by task name and plot time-series data at the same time, but I can at least plot bar graphs of tasks grouped by name and I can then just use filter to specify time frame. Here are some docs on the query format that lets you select virtual fields: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_QuerySyntax.html
Here's an example of a query that I use to view tasks received by a worker running on django-eb-sqs-worker library:
fields @message
| parse @message "[*] Received Task(*,*" as hash, taskName, restOfMessage
| display @timestamp, taskName, restOfMessage
| filter ispresent(taskName)
| stats count(*) by taskName
Here's a query for filtering finished tasks:
fields @message
| parse @message "* Finished Task(*,*. Result: *. Execution time: *s.*" as hash, taskName, restOfMessage1, result, executionTime, additional
| display @timestamp, taskName, result, executionTime, additional
| filter ispresent(taskName)
| stats count(*) by taskName
CodePudding user response:
The way I'd monitor or debug such issues is by either searching though the logs with cloudwatch log insights. You can write a query to count the messages grouped by type and get a histogram over time.
Another way to continously monitor different message types is by using cloudwatch embedded metric format (EMF) for logging. You push a log entry per message and dimension them on message type to be able to view them in cloudwatch metrics and also put alerts on them. However since you mentioned it's mainly for debugging, I'd recommend cloudwatch log insights.