In cloud based applications, we often have to make network calls, be it for a queue message lookup or a database call or table storage lookup. Every such attempt will increase the cost which will be added to the cloud bill. This cost might be trivial but as these kinds of applications have higher probability of transient faults and latency related faults, one might write retry logic for network operations in order to prevent the application from failing after one attempt.

In this post I will try to highlight various retry mechanisms and will conclude on which one fits as a better choice for cloud based applications. In order to explain this in detail I have created a sample project on github 

This project is a simple console application which contains various retry strategies. The console app takes url of a website as input and would try to make a GET request using HttpClient class. If it gets the response it will display the result on the console and if it doesn’t, then it runs these various retry strategies making us understand the difference between them.

Tight Loop Retries

In this mechanism the retries are attempted continuously without any delay for a limited number of time. This mechanism is not recommended for the cloud based applications as it doesn’t allow to wait and sync in ongoing changes if any.

Constant Time Interval Retries

In this retry mechanism one might wait for some constant period of time before retrying the network call.

Following code shows the network operation passed as a third argument to ConstantTimeInterval method which executes it for constant period of 2 seconds with a maximum retry limit of 10.

The downside of this approach is slightly more subtle. It may happen that you have multiple processes running the same code on same time with same time interval creating a bottleneck in the system. This could result in higher cost and issues which would be harder to catch at first glance.

Random Interval Retries

In this mechanism we can randomize the time interval in order to reduce the chances of a bottleneck.

Following code shows the network operation passed as a fourth argument to RandomInterval method which executes it at a random interval between 2 to 5 seconds with a maximum retry limit of 10.

Well, even after this it turns out that there is a better approach of handling such situation with “Exponential Backoff Retries“.

Exponential Backoff Retries

This type of retries are also known as Progressive Backoff retries wherein the network call is retried after a progressive time interval.

Following code shows the network operation passed as the fourth argument to ExponentialBackOff method which executes it at a progressive time interval of double the previous attempt time with a maximum limit of 10 seconds and 10 attempts. In this case, the first retry attempt would be made with an interval of 1 second and if that fails, the next attempt would be made with an interval of 2 seconds, if that fails, the next attempt is made after an interval of 4 seconds and then 8 seconds and at last 10 seconds after which it keeps on attempting the retries at a steady interval of 10 seconds after each unsuccessful attempt until the limit of 10 attempts is reached.

The Exponential Backoff retries method is a recommended approach of retries implementation for network operations in cloud applications. In fact this approach is also used in network protocols such as CSMA where they try to  re transmit the packets after hitting a failure.

Conclusion

In a nutshell, It is important to think about the retry mechanism while writing any network related logic in cloud based applications. Exponential Backoffs and Policy based retry mechanisms where the interval time is configurable and variable, are the preferred approaches for such applications. In fact there are many cloud libraries which implement such retry patterns. Transient Fault Handling Application Block is one such example.

 

 

 

Share This:

Retry logic for network operations in cloud apps
Tagged on:         

Leave a Reply

Your email address will not be published.