I have all sorts of metrics I would like to count and later query. For example I have a lambda that processes stuff from a queue, and for each batch I would like to save a count like this:
{
"processes_count": 6,
"timestamp": 1695422215,
"count_by_type": {
"type_a": 4,
"type_b": 2
}
}
I would like to dump these pieces somewhere and later have the ability to query how many were processed within a time range.
So these are the options I considered:
- write the json to the logs, and later have a component (beats?) that processed these logs and send to a timeseries db.
- in the end of each execution send it directly to a timeseries db (like elasticearch).
What is better in terms of cost / scalability? Are there more options I should consider?
CodePudding user response:
I think Cloud Watch Embedded Metric Format (EMF) would be good here. There are client libraries for Node.js, Python, Java, and C#.
CW EMF allows you to push metrics out of Lambda into CloudWatch in a managed async way. So it's a cost-effective and low-effort way of producing metrics.
The client library produces a particular JSON format to stdout, when CW sees a message of this type it automatically creates the metrics for you from it.
You can also include key-value pairs in the EMF format which allows you to go back and query the data with these keys in the future.
High-level clients are available with Lambda Powertools in Python and Java.