Polly - How to achieve a circuit breaker that opens the circuit on WaitAndRetry failure and puts bac-CodePudding

I was used to WaitAndRetryForeverAsync in the past which was wrong because I believe the Retry pattern is supposed to handle only transient faults, such as rate limiting, 429 status code, etc. At the moment that the API I was subscribing to went offline for service maintenance which took about 25 minutes, WaitAndRetryForeverAsync was retrying forever in a constant interval (not exponential which doesn't really matter in this case) which in fact triggered some firewall rules on the API side and my IP was blocked for a while.

I'm trying to do what Nick Chapsas says in his Circuit Breaker video, i.e. if it fails to retry 5 times -> we make the assumption that the service is in maintenance. So enable the retries after 30 minutes and so on and so forth until it reconnects, even if it takes hours to do (depending on how long the service maintenance is).

The question is how do I achieve that circuit breaker policy after WaitAndRetry's failure?

/// <summary>
///     This class provides Transient Fault Handling extension methods.
/// </summary>
internal static class Retry
{
    public static void Do(Action action, TimeSpan retryInterval, int retryCount = 3)
    {
        _ = Do<object?>(() =>
        {
            action();
            return null;
        }, retryInterval, retryCount);
    }

    public static async Task DoAsync(Func<Task> action, TimeSpan retryInterval, int retryCount = 3)
    {
        _ = await DoAsync<object?>(async () =>
        {
            await action();
            return null;
        }, retryInterval, retryCount);
    }

    public static T Do<T>(Func<T> action, TimeSpan retryWait, int retryCount = 3)
    {
        var policyResult = Policy
            .Handle<Exception>()
            .WaitAndRetry(retryCount, retryAttempt => retryWait)
            .ExecuteAndCapture(action);

        if (policyResult.Outcome == OutcomeType.Failure)
        {
            throw policyResult.FinalException;
        }

        return policyResult.Result;
    }

    public static async Task<T> DoAsync<T>(Func<Task<T>> action, TimeSpan retryWait, int retryCount = 3)
    {
        var policyResult = await Policy
            .Handle<Exception>()
            .WaitAndRetryAsync(retryCount, retryAttempt => retryWait)
            .ExecuteAndCaptureAsync(action);

        if (policyResult.Outcome == OutcomeType.Failure)
        {
            throw policyResult.FinalException;
        }

        return policyResult.Result;
    }
}

CodePudding user response：

Simplest solution

If you want to have the following delay sequences:

15sec, 15sec, 15sec, 15sec, 15sec, 30min
15sec, 15sec, 15sec, 15sec, 15sec, 30min
etc.

Then you can achieve this by introducing by the following helper method:

static IEnumerable<TimeSpan> GetSleepDuration()
{
    while(true)
    {
        for (int i = 0; i < 5; i  )
        {
            yield return TimeSpan.FromSeconds(15);
        }
        yield return TimeSpan.FromMinutes(30);
    }
}

The usage is pretty simple:

var sleepDurationProvider = GetSleepDuration().GetEnumerator();
        
var retry = Policy
   ...
   .WaitAndRetryForever(_ => 
   { 
      sleepDurationProvider.MoveNext(); 
      return sleepDurationProvider.Current; 
   });

Under this SO question I have showed a somewhat similar solution.

Sophisticated solution

By default the policies are unaware of each other, even if they are chained together via PolicyWrap. One way to exchange information between policies is the usage of Polly's Context.

First let's define the Circuit Breaker

const string SleepDurationKey = "Broken";
IAsyncPolicy<HttpResponseMessage> GetCircuitBreakerPolicy()
{
    return Policy<HttpResponseMessage>
        .HandleResult(res => res.StatusCode == HttpStatusCode.TooManyRequests)
        .CircuitBreakerAsync(6, TimeSpan.FromMinutes(30),
           onBreak: (dr, ts, ctx) => ctx[SleepDurationKey] = ts,
           onReset: (ctx) => ctx.Remove(SleepDurationKey));
}

It breaks after 6 failed attempts and it will remain broken for 30 minutes
When the cb transitions to Open then the Context will contain the sleep duration
When the cb transitions to Closed then the Context will not contain anymore the sleep duration

Please be aware that with this setup the delay sequence will look like this:

15sec, 15sec, 15sec, 15sec, 15sec, 30min, 30min, 30min, etc.

And finally let's define the retry policy

IAsyncPolicy<HttpResponseMessage> GetRetryPolicy()
{
    return Policy<HttpResponseMessage>
        .HandleResult(res => res.StatusCode == HttpStatusCode.TooManyRequests)
        .Or<BrokenCircuitException>()
        .WaitAndRetryForeverAsync((_, ctx) =>
            ctx.ContainsKey(SleepDurationKey) ? (TimeSpan)ctx[SleepDurationKey] : TimeSpan.FromSeconds(15));
}

Under this SO question I have showed a somewhat similar solution.

UPDATE #1

If you want to have the same delay sequences like what we had under the simplest solution section but you want to use Circuit Breaker for that then you have to use some workaround.

The necessity of this workaround is that we have a HalfOpen state as well, not just Closed and Open. The above delay sequences would be used out-of-the-box if we would have only Closed and Open states. But after the break duration the Circuit Breaker transitions into HalfOpen (to allow a probe) rather than to Closed.

At first glance the Advanced Circuit Breaker could provide this "auto-reset feature" because of its samplingDuration extra parameter. But unfortunately the ACB also has HalfOpen state.

The workaround is that we force the Circuit Breaker to transition back to Closed by explicitly calling the Reset function on it.

So, the solution is the following:

var circuitBreaker = Policy<HttpResponseMessage>
    .HandleResult(res => res.StatusCode == HttpStatusCode.TooManyRequests)
    .CircuitBreakerAsync(6, TimeSpan.FromSeconds(10),
        onBreak: (dr, ts, ctx) => ctx[SleepDurationKey] = ts,
        onReset: (ctx) => {  });

I have removed the context clearing logic from the onReset to avoid race-condition

var retry = Policy<HttpResponseMessage>
    .HandleResult(res => res.StatusCode == HttpStatusCode.TooManyRequests)
    .Or<BrokenCircuitException>()
    .WaitAndRetryForeverAsync((_, ctx) =>
    {
        if (ctx.ContainsKey(SleepDurationKey))
        {
            var sleepDuration = (TimeSpan)ctx[SleepDurationKey];    
            var resetSignal = new CancellationTokenSource(sleepDuration.Add(TimeSpan.FromSeconds(-1)));  
            resetSignal.Token.Register(() => { ctx.Remove(SleepDurationKey); circuitBreaker.Reset(); });

            return sleepDuration;
        }
        return TimeSpan.FromSeconds(15);
    });

If the key is present then I retrieve the sleep duration from the context
I create a timer which should be triggered just before the CB automatically transactions to HalfOpen
- When it triggers it clears the context and calls the Reset on the circuit breaker

With this "trick" we could skip the HalfOpen state. Please note that even though the retry has explicit reference to the circuitBreaker you still have to use the PolicyWrap to make it work.

var combined = Policy.WrapAsync(retry, circuitBreaker);
var result = await combinedPolicy.ExecuteAsync(...);