Reliable Database Connections and Commands with Polly

Network services can fail or become temporarily unreachable unexpectedly. This is especially true when running code on cloud providers like AWS or Azure and more often than not these things are out of our control. A well-designed system should support some reasonable level of resiliency in communication links between components.

In this post we'll talk about fault-tolerance, but more specifically one of many ways (I'm sure) to support reliable database connections and commands via configurable retry policies using Polly, an awesome library which allows your code to be more resilient to failure via retry, circuit breaker and other fault-handling policies.

Let's look at a very basic Polly retry policy.

Policy retryPolicy = Policy.Handle<SqlException>().WaitAndRetry(  
   retryCount: 3,
   sleepDurationProvider: attempt => TimeSpan.FromMilliseconds(1000));

retryPolicy.Execute(() =>  
{
   // Perform an operation here
})

The example above configures a policy which will execute any given action and attempt to retry it up to 3 times with 1000 milliseconds between retries upon receiving an exception of type SqlException. If all retries fail, the original exception will be re-thrown and bubble up as it normally would.

The neat thing about Polly is that you can intertwine multiple policies together to support just about any scenario you may have. I would really recommend giving Polly wiki a quick read to see all kinds of interesting ways this library can help you.

The Retry Policy

Now that we have the general idea about Polly, let's package up our custom policies so we can consume them somewhere downstream. We'll do this by creating an interface for a retry policy. We'll want our retry policies to be able to execute void methods and methods with a return type. And just to spice things up I'll also throw in support for asynchronous executions.

public interface IRetryPolicy  
{
    void Execute(Action operation);

    TResult Execute<TResult>(Func<TResult> operation);

    Task Execute(Func<Task> operation, CancellationToken cancellationToken);

    Task<TResult> Execute<TResult>(Func<Task<TResult>> operation, CancellationToken cancellationToken);
}

Nice! Now let's implement a policy that we shall use for database-related operations.

public class DatabaseCommunicationRetryPolicy : IRetryPolicy  
{
    private const int RetryCount = 3;
    private const int WaitBetweenRetriesInMilliseconds = 1000;

    private readonly int[] _sqlExceptions = new[] { 53, -2 };

    private readonly Policy _retryPolicyAsync;
    private readonly Policy _retryPolicy;

    public DatabaseCommunicationRetryPolicy()
    {
        _retryPolicyAsync = Policy
            .Handle<SqlException>(exception => _sqlExceptions.Contains(exception.Number))
            .WaitAndRetryAsync(
                retryCount: RetryCount,
                sleepDurationProvider: attempt => TimeSpan.FromMilliseconds(WaitBetweenRetriesInMilliseconds)
            );

        _retryPolicy = Policy
            .Handle<SqlException>(exception => _sqlExceptions.Contains(exception.Number))
            .WaitAndRetry(
                retryCount: RetryCount,
                sleepDurationProvider: attempt => TimeSpan.FromMilliseconds(WaitBetweenRetriesInMilliseconds)
            );
    }

    public void Execute(Action operation)
    {
        _retryPolicy.Execute(operation.Invoke);
    }

    public TResult Execute<TResult>(Func<TResult> operation)
    {
        return _retryPolicy.Execute(() => operation.Invoke());
    }

    public async Task Execute(Func<Task> operation, CancellationToken cancellationToken)
    {
        await _retryPolicyAsync.ExecuteAsync(operation.Invoke);
    }

    public async Task<TResult> Execute<TResult>(Func<Task<TResult>> operation, CancellationToken cancellationToken)
    {
        return await _retryPolicyAsync.ExecuteAsync(operation.Invoke);
    }
}

Here I'm essentially using the same type of policy that we saw in the previous example. The only difference is that we're only going to be retrying when the SQL exception number matches one in the array. This is important because you should not blindly retry every SQL exception, but rather focus on transient exceptions only. The exception numbers above are not by any means an exhaustive list and you should research what types of errors you consider to be transient.

The policy shown here is very trivial and a word of caution is in order.

You must never ever exhaust your database connection pool when performing retries, especially in a production environment.

A good improvement to this policy would be the addition of a circuit breaker policy, which would open for a period of time after all retries fail and allow your database to recover without overwhelming the connection pool. I'll leave this as an exercise for the reader. :)

Reliable Database Connections

With the retry policy ready we need to figure out a way to consume and use it when opening a database connection. You can do this by simply calling Open() on IDbConnection inside of the retry policy any time you need to open a connection. However, this seems very intrusive. I believe a more transparent approach is to create a decorator for the existing database connection class and wrap any critical operations inside the retry policy.

public class ReliableSqlDbConnection : DbConnection  
{
    private readonly SqlConnection _underlyingConnection;
    private readonly IRetryPolicy _retryPolicy;

    private string _connectionString;

    public ReliableSqlDbConnection(
        string connectionString,
        IRetryPolicy retryPolicy)
    {
        _connectionString = connectionString;
        _retryPolicy = retryPolicy;
        _underlyingConnection = new SqlConnection(connectionString);
    }

    public override string ConnectionString
    {
        get
        {
            return _connectionString;
        }

        set
        {
            _connectionString = value;
            _underlyingConnection.ConnectionString = value;
        }
    }

    public override string Database => _underlyingConnection.Database;

    public override string DataSource => _underlyingConnection.DataSource;

    public override string ServerVersion => _underlyingConnection.ServerVersion;

    public override ConnectionState State => _underlyingConnection.State;

    public override void ChangeDatabase(string databaseName)
    {
        _underlyingConnection.ChangeDatabase(databaseName);
    }

    public override void Close()
    {
        _underlyingConnection.Close();
    }

    public override void Open()
    {
        _retryPolicy.Execute(() =>
        {
            if (_underlyingConnection.State != ConnectionState.Open)
            {
                _underlyingConnection.Open();
            }
        });
    }

    protected override DbTransaction BeginDbTransaction(IsolationLevel isolationLevel)
    {
        return _underlyingConnection.BeginTransaction(isolationLevel);
    }

    protected override DbCommand CreateDbCommand()
    {
        return _underlyingConnection.CreateCommand();
    }

    protected override void Dispose(bool disposing)
    {
        if (disposing)
        {
            if (_underlyingConnection.State == ConnectionState.Open)
            {
                _underlyingConnection.Close();
            }

            _underlyingConnection.Dispose();
        }

        GC.SuppressFinalize(this);
    }
}

I'm assuming you're using Microsoft SQL Server throughout this post. However, an identical approach shall be taken when a different database engine is used.

Note how we're opening the connection inside of the retry policy in ReliableSqlDbConnection class above. This is probably the only notable thing worth mentioning about this class as the rest of the stuff is fairly self-explanatory (I hope).

So, if you want fault-tolerant connections simply use ReliableSqlDbConnection instead of SqlConnection. Cool, but we're not done yet!

Reliable Database Commands

What about database commands? Don't we want those to retry as well? Yep, and we'll do it very similarly to the way we handled database connections... by creating a decorator for SqlCommand class.

public class ReliableSqlDbCommand : DbCommand  
{
    private SqlCommand _underlyingSqlCommand;
    private readonly IRetryPolicy _retryPolicy;

    public ReliableSqlDbCommand(SqlCommand command, IRetryPolicy retryPolicy)
    {
        _underlyingSqlCommand = command;
        _retryPolicy = retryPolicy;
    }

    public override string CommandText
    {
        get => _underlyingSqlCommand.CommandText;
        set => _underlyingSqlCommand.CommandText = value;
    }

    public override int CommandTimeout
    {
        get => _underlyingSqlCommand.CommandTimeout;
        set => _underlyingSqlCommand.CommandTimeout = value;
    }

    public override CommandType CommandType
    {
        get => _underlyingSqlCommand.CommandType;
        set => _underlyingSqlCommand.CommandType = value;
    }

    public override bool DesignTimeVisible
    {
        get => _underlyingSqlCommand.DesignTimeVisible;
        set => _underlyingSqlCommand.DesignTimeVisible = value;
    }

    public override UpdateRowSource UpdatedRowSource
    {
        get => _underlyingSqlCommand.UpdatedRowSource;
        set => _underlyingSqlCommand.UpdatedRowSource = value;
    }

    protected override DbConnection DbConnection
    {
        get => _underlyingSqlCommand.Connection;
        set => _underlyingSqlCommand.Connection = (SqlConnection)value;
    }

    protected override DbParameterCollection DbParameterCollection => _underlyingSqlCommand.Parameters;

    protected override DbTransaction DbTransaction
    {
        get => _underlyingSqlCommand.Transaction;
        set => _underlyingSqlCommand.Transaction = (SqlTransaction)value;
    }

    public override void Cancel()
    {
        _underlyingSqlCommand.Cancel();
    }

    public override int ExecuteNonQuery()
    {
        return _retryPolicy.Execute(() => _underlyingSqlCommand.ExecuteNonQuery());
    }

    public override object ExecuteScalar()
    {
        return _retryPolicy.Execute(() => _underlyingSqlCommand.ExecuteScalar());
    }

    public override void Prepare()
    {
        _retryPolicy.Execute(() => _underlyingSqlCommand.Prepare());
    }

    protected override DbParameter CreateDbParameter()
    {
        return _underlyingSqlCommand.CreateParameter();
    }

    protected override DbDataReader ExecuteDbDataReader(CommandBehavior behavior)
    {
        return _retryPolicy.Execute(() => _underlyingSqlCommand.ExecuteReader(behavior));
    }

    protected override void Dispose(bool disposing)
    {
        if (disposing)
        {
            _underlyingSqlCommand.Dispose();
        }

        GC.SuppressFinalize(this);
    }
}

And for the last tweak, let's come back to ReliableSqlDbConnection class and update CreateDbCommand method to return the new database command wrapper instead of the original SqlCommand.

public class ReliableSqlDbConnection  
{
   ...

        protected override DbCommand CreateDbCommand()
        {
            return new ReliableSqlDbCommand(_underlyingConnection.CreateCommand(), _retryPolicy);
        }

   ...
}

Wiring things up

If you're using a dependency injection (DI) container of some sort, you'll need to register IRetryPolicy interface and the corresponding DatabaseCommunicationRetryPolicy implementation class. Additionally, don't forget to wire-up ReliableSqlDbConnection instead of SqlConnection in your DI container.

Integrating with Dapper.AmbientContext

If you read my previous post about Dapper.AmbientContext and would like to integrate this change, I'm stoked to tell you that it's super-duper easy. Just update your IDbConnectionFactory implementation to return ReliableSqlDbConnection instead of SqlConnection as shown below.

public class SqlServerConnectionFactory : IDbConnectionFactory  
{
    private readonly string _connectionString;
    private IRetryPolicy _retryPolicy;

    public SqlServerConnectionFactory(string connectionString)
    {
        _connectionString = connectionString;
        _retryPolicy = new DatabaseCommunicationRetryPolicy();
    }

    public IDbConnection Create()
    {
        return new ReliableSqlDbConnection(_connectionString, _retryPolicy);
    }
}

Things to consider

  1. I'd like to point out, again, that the retry policy used in this post is not by any means production-ready code. What you'll end up using is what works in your particular scenario. However, you already have all tools at your disposal. Remember to give Polly wiki a quick read!

  2. You may find out via testing (or other means) that it is a good idea to setup different policies for database connections and database commands. If this is the case, you should consider creating a factory for IRetryPolicy which will return an implementation for each use case you may have. Some DI containers will actually offer this feature right out of the box, so I would consider researching that area first.

  3. Don't limit IRetryPolicy to just database connections and commands. This is a great pattern to use in any place where your code is communicating with a remote resource. Remember, network blips are often sporadic and (to a large degree) not preventable and while retrying is not a silver bullet it will certainly help quite a bit!

Happy coding! ;)