I have a Lambda function about data science that gets a user id and a list of operations to perform over the data of this user.
Example path:
calculate?userId=1&operations=func1,func2,func3,func4,func5
In the Lambda function I am running calling all specified functions in a for loop and the functions are not that short-running. Every single one of them query the database and there are some overlapping queries. I have implemented sharing of the queries between functions.
I am suspecting that calling each function in the for loop is a good thing because for example while the func1
is running, func2
is waiting and so on. Should I:
- Run all of the functions in parallel with
asyncio
? So that they do not wait for each other to finish. - Convert this function to a state machine and multiple Lambda functions (one for each function that I specified in query params) and implement necessary state transitions and etc.
CodePudding user response:
If the functions are dependent on each other and you're not hitting lambda timeout limit, I would leave it as it is.
If they can run independently, I would split:
- run all shared queries with asyncio
- run all functions with asyncio (assuimg they still use IO to write the results somewhere)
Unfortunately, without any more specific example it is difficult to give more details.
CodePudding user response:
I'd personally go for step functions with state machines but that would also increase the cost in your case. The additional cost, though, comes with added benefits of the step function service.