Configure Quorum for SQL Server High Availability

Quorum, quorum, quorum, quorum, quorum. Quorum, quorum, quorum, quorum, quorum, quorum and quorum. Quorum, quorum, quorum! Quorum, quorum, quorum, quorum, quorum, quorum, quorum. Quorum, quorum, quorum, quorum, quorum, quorum; quorum, quorum, quorum.

If you repeat the word quorum enough times, you will eventually learn and understand it.

If only that were true.

Now I don’t pretend to be an expert with the full set up side of Windows Server or the Windows Server Failover Clusters. I am a DBA after all. We have separate infrastructure teams to deal with this right? Even though this is normally the case, I can’t stress how important it is to at least understand how quorum works with your Availability Groups and that you get it configured right.

A classic Failover Cluster Instance would have used shared storage with 2 nodes (or more). The Instance can be active on either node. Quorum is used to decide where the instance should be online and whether it should be online at all.

AlwaysOn Availability Groups still uses a failover clustering feature but it uses individual storage and each has a copy of the SQL databases that are within the AG. Quorum now figures out where are the databases writable and if it is safe to have them online. The below picture shows a 3 node AlwaysOn Availability Group set up.

You can also combine the features of FCI and Always On availability groups and they can all be part of the windows server failover cluster. The below diagram shows a rather complex set up using shared storage on the Primary and Secondary databases in the main data centre and a 3rd node in a second data centre without using shared storage.

Quorum

The method of working out where it is safe for your databases to be online is done by using quorum which is set up via the Failover Cluster Manager

Quorum is all about voting and the cluster has to make sure that we are online in the right place as we cannot have two nodes with read/write databases. Only one server can be writable at a time.  Each server in a cluster would have an assigned vote of 1. If you have a 2 node cluster and everything is fine, both servers have 1 vote and they can see each other’s votes therefore the cluster can see 2 votes and it knows there is only a total of 2 votes. As long as each node can see each other, they have a majority of 2 and your cluster and your AGs stay online. Happy days!

However if, for example, there is a network outage between the two servers in the cluster, then they would count their own vote but they wouldn’t count the others vote because each server wouldn’t know the other one wasn’t live, therefore the cluster wouldn’t be online as it couldn’t reach a majority. Where’s your High Availability now?

To circumvent this problem where the number of nodes is even, we have a witness that also has a vote; therefore we then have an Odd number of votes so there is a better chance of keeping quorum and your cluster staying online. E.g. if a passive  node has been rebooted, their vote won’t be counted during this time but as there is a witness vote making 2 out of 3 votes, the cluster knows that it should keep the other node online during this time. In the days prior to Windows Server 2012, this was normally done using a witness disk (prior to 2008) or a file-share witness (from 2008 R2 onwards) but this would only be configured when there was an even number of nodes in a cluster.

Here is a good link which explains the different Quorum options for 2008 and 2008 R2

Dynamic Witness and Dynamic Quorum

The version of Windows Server you are using will really matter when it comes to setting up high availability. There have been a number of enhancements over the different versions of Windows Server, namely dynamic witness and dynamic quorum which was added from Windows Server 2012 onwards.

The best thing about dynamic quorum is that it is configured by default. Yay! Dynamic Quorum works because when a server goes offline; its vote is also removed. This way quorum can still be maintained. When the server comes back online, the vote is added back.

The Dynamic Witness feature was introduced with Windows Server 2012 R2. Dynamic Witness automatically adjusts your voting capabilities depending on the number of votes available to meet quorum.

A dynamic witness is only used when an even number of votes is detected. E.g. With a 3 node cluster, if one server was to go down, the dynamic witness would then become active as it could only count two votes therefore it needs to make the vote count an odd number so that the cluster still had quorum.

Dynamic Witness and Dynamic Quorum work together to maintain your high availability when any of your voting members become unavailable. As these are configured by default, it means that there should be less overhead supporting the cluster and making sure it has an odd number of votes.

Just because the technology has improved doesn’t mean you need to worry about it less. Make sure your cluster is set up correctly and monitor to it so you know when a server or a file share witness has gone offline and more importantly document how you have configured it and how to recover from a failure.

This post is meant to help you (and me!) understand Quorum and how important it is in your Always On Availability Group set up.

The following links will help you configure and set this up properly.

Kendra Little – How to Configure Quorum in SQL Server

Microsoft Blogs – Understanding Quorum in a Failover Cluster

Understanding Failover Cluster Quorum

What exactly is a file-share witness and when should I use one?