I have an Azure Function triggered by an EventGrid when a blob is created.
Based on the size of the blob (this is a pdf file), my Azure Function can take any where between 2 seconds to 600 seconds (10 minutes) to execute.
As per Azure documentation, the ExentGrid retries to deliver the event again if it does not receive a response from the end point (in this case, the end point is my Azure function) with in 30 seconds.
- 10 seconds
- 30 seconds
- 1 minute
- 5 minutes
- 10 minutes
- 30 minutes
- 1 hour
- 3 hours
- 6 hours
- Every 12 hours up to 24 hours
I don't see any issues for the smaller files that I upload to the storage, My azure function executes and hopefully the EventGrid receives the response under 30 senconds, hence my function is execute only once.
Issue: For larger files, My azure function is triggered by the eventgrid (as expected) and the execution starts, however due to the large file size, my function executes for well over 30 seconds, Since the eventgrid did not receive any success response from end point (as the function is still executing), it sends another event and my function initiates another instance for the same file, this way the function executes several times for time same file.
How can I handle this situation, Can I change the retry mechanism for the eventgrid only for this function, or is there a better way to handle this problem.
Any help would be greatly appreciated.
CodePudding user response:
Azure looks for timely response(<30s) from Azure Function or webhook event handlers, there seems to be no setting to increase this time limit. On receiving an event, instead of doing the actual long running
work, you must push a message
to a Azure queue
, and let your function pick up messages from that queue. This allows you to just enqueue the work and quickly return response to Azure Event grid within 30seconds, and also scales up your event handling[even if more blobs are uploaded as a burst, your application can handle it].
CodePudding user response:
Please check if the below steps helps to :
The retry policy for the function app is unaffected by the trigger's retries or resiliency. Only a trigger resilient retry will be used with the function retry policy.
You can read about retry behavior at here.
According to the documentation, the subscriber (such as your EventGridTrigger function) must respond to the AEG within 30 seconds, otherwise the message will be queued for retry.
The event is removed from the retry queue when the AEG receives a successful response from the delivery destination endpoint within 3 minutes (subscriber). When the deadlettering option is enabled, the event is also removed from the retry queue if the response failure code is 400 or 413.
The AEG delivered a duplicate event within 3 minutes based on the aforementioned and your long-term subscriber.
In your solution, I advocate using a Push-Pull pattern, such as delivering an event to a storage queue.
The delivery Push-Pull pattern in the loosely decupled Pub/Sub architecture provides for real-time delivery (storage) of events to resource entities (such as storage queues, event hubs, service buses, and so on) and then pulling them up based on needs. Note that, according to the delivery and retry policy, the AEG expects an ACK/NACK response from the delivery process. Another thing to keep in mind is that in the Push-Push pattern (such as EventGridTrigger), the response time might be influenced by whether or not the function is "cold launched."
To see the original event message with details whether it is failed or successful response, please refer this workaround