My team is working on an AWS Lambda function that has a configured timeout of 30 seconds. Given that lambdas have this timeout constraint and the fact that they can be reused for subsequent requests, it seems like there will always be the potential for the function's execution to timeout prior to completing all of its necessary steps. Is this a correct assumption? If so, how do we bake in resiliency so that db updates can be rolled back in the case of a timeout occurring after records have been updated, but a response hasn't been returned to the function's caller?
To be more specific, my team is managing a Javascript-based lambda (Node.js 16.x) that sits behind an Api Gateway and is an implementation of a REST method to retrieve and update job records. The method works by retrieving records from DynamodDB given certain conditions, updates their states, then returns the updated job records to the caller. Is there a means to detect when a timeout has occurred and to rollback (either manually or automatically) the updated db records so that they're in the same state as when the lambda began execution?
CodePudding user response:
It is important to consider the consequences of what you are trying to do here. Instead of finding ways to detect when your Lambda function is about to expire, the best practice is to first monitor a good chunk of executed requests and analyze how much time, on average, it takes to complete the said requests. Perhaps 30 seconds
may not be enough to complete the transaction implemented as a Lambda function.
Once you work with an admittable timeout that suits the average execution time for requests, you can minimize the possibility of rollbacks because of incomplete executions with the support for transactions in DynamoDB. It allows you to group multiple operations together and submit them as a single all-or-nothing, thus ensuring atomicity.
Another aspect related to the design of your implementation is about how fast can you retrieve data from DynamoDB without compromising the timeout. Currently, your code retrieves records from DynamoDB and then updates them if certain conditions are met. This creates a need for this read to happen as fast as possible so the subsequent operation of update can start. A way for you to speed up this read is enabling the DAX (DynamoDB Accelerator) to achieve in-memory acceleration. This acts as a cache for DynamoDB with microseconds of latency.
Finally, if you wat to be extra careful and not even start a transaction in DynamoDB because there will be not enough time to do so, you can use the context object from the Lambda API to query for the remaining time of the function. In Node.js, you can do this like this:
let remainingTimeInMillis = context.getRemainingTimeInMillis()
if (remainingTimeInMillis < TIMEOUT_PASSED_AS_ENVIRONMENT_VARIABLE) {
// Cancel the execution and clean things up
}
I hope that helps with your question. Let us know if you need any more help on this.