How to Use Polly in .NET
In this article, I want to describe a toolset offered by Polly and some bad and good ways to use it. Hopefully, you get a better understanding of resiliency and fault-tolerant code.
Few words about Polly
Polly is a .NET resilience and transient-fault-handling library that allows developers to express policies such as Retry, Circuit Breaker, Timeout, Bulkhead Isolation, and Fallback in a fluent and thread-safe manner.
Let the story begin with CircuitBreaker
We start or journey from one quite useful resiliency pattern known as Circuit Breaker:
Handle faults that might take a variable amount of time to recover from when connecting to a remote service or resource. This can improve the stability and resiliency of an application.
This quote doesn’t make it clear what a circuit breaker does. I want to describe how it works using a simple example. We need to integrate with API, which provides information about weather and this weather API is not under our ownership. Working with third-party APIs makes us think about a case when another API goes down. We don’t know the reason why 3rd party API went down and when it becomes available again. For our flow, it’s better to skip requests to not working API instead of waiting for a timeout as an operation result. And Circuit Breaker perfectly fits in this game. It breaks a flow and prevents any request to corrupter API for some time.
Let’s dive into examples
1 | public static async Task BasicAsync() |
In this example, the circuit breaker opens after two exceptions and, during 1 second, stays open and then again allows us to get an error. It’s a pretty clear example of how it works. Another system has a second break to become available again.
Here is a more complex configuration for the circuit breaker:
1 | public static async Task AdvancedAsync() |
We can configure the circuit breaker in a more sophisticated way instead of just a number of errors. We provide that it should open if 50% of requests throw an error during two seconds with the condition that the minimal amount of requests is 3.
Both examples are valid, and it’s our responsibility to decide which configuration is better suited for us.
Timeout policy and policy wrap
Another helpful policy provided by Polly is Timeout. It determines how long the operation can be running and if it exceeds that threshold, the timeout policy force stops invocation and throws an error.
In the following example, I want to combine the timeout policy that throws errors exceeded by timeout with a circuit breaker policy that prevents execution after 100% or errors during the last 3 seconds.
1 | public static async Task TimeoutConsequenceAsync() |
We created a timeout policy that allows code to be executed up to 1 second, and in the case of the Pessimistic strategy, the timeout policy throws an exception. If we choose an optimistic strategy, we have to configure our code to rely on CancelationToken provided by the timeout policy.
Also, we combined two policies into one complex Policy using Policy.WrapAsync(polices)
It would help if you remembered that the Policy that triggers closer to your code must be placed at a righter position in wrap invocation. In the example above, the timeout policy wraps code, and the circuit policy wraps timeout policy.
If you run the example above, code will be executed two and failed by timeout policy circuit breaker decided that threshold reached and came to the opened state for 1 second then again timeout policy triggered and again circuit comes to open state.
Let’s take a look at what happens when we open tasks for execution simultaneously:
1 | public static async Task TimeoutRandomParallelAsync(){ |
In this example, we created ten tasks and waited for them to be completed. In such an example, the circuit breaker never comes into the game, and we get ten timeout exceptions. Circuit breaker passes request before it gets information that another request threw an error.
This example can demonstrate what happens in the real world when many requests come to our application and the circuit breaker allow them all to hit into the broken part, but after it detects threshold break, the following requests face opened circuit breaker.
What to do when the circuit is open?
Polly provides a convenient way to handle errors with Fallback Policy. Let’s enrich the previous example with the fallback Policy:
1 | public static async Task FallbackWithTimeoutAsync() |
We created a fallback Policy that can handle a list of exception types and provide a fallback strategy in that case. Fallback should be specified as most left Policy during Policy.Wrap operation.
What about Retries?
As a developer, we should be careful with retry policies. It can cause a cost spike with a pity configuration and even kill your cluster. Polly has a good list of examples of how to configure retries. I want to highlight that if you need to create a retry policy, it’s better to configure it with exponential delays and a limited count of times.
Conclusion
The circuit breaker is a must-have pattern to use if you deal with distributed systems. As a developer, you should take care of cases when some part of the system doesn’t, and here fallback policy perfectly fits. Also, try to limit external calls with a timeout, and no one likes to wait.