Windows Server Failover Clustering on an Encrypted Overlay Network with VNS3

by | 10 Jun 2021

Windows Server Failover Clustering (WSFC) is a powerful technology in the Microsoft arsenal of offerings. Unfortunately, it was designed for data centers under the assumption that it would be running on physical servers with copper wires running between them. Wouldn’t it be nice to be able to bring WSFC capabilities to the public cloud?

From Microsoft’s description, a failover cluster is a group of independent computers that work together to increase the availability and scalability of clustered roles (formerly called clustered applications and services). These clustered servers (called nodes) are connected by physical cables and software. If one or more of the clustered nodes fails, other nodes begin to provide service (a process known as failover). In addition, the clustered roles are proactively monitored to verify that they are working properly. If clustered roles are not working, they are automatically restarted or moved to another node.

Failover clusters also offer Cluster Shared Volume (CSV) functionality, providing a consistent, distributed namespace that clustered roles can use to access shared storage from all clustered nodes. With the Failover Clustering feature users experience a minimum of disruptions in service.

Implementing WSFC in the cloud is difficult because even though we have plenty of software, we lack physical cables. In this example we will use Cohesive Networks VNS3 in order to virtualize the wires necessary to trick WSFC into working.

The configuration documentation for WSFC states that each server needs to run in its own subnet. A /29 works well as you lose the top and bottom addresses for gateway and broadcast, leaving you with 6 available addresses. As we will see, you will assign the first and second low addresses, leaving you with an additional 4 addresses to be used for listeners and or cluster roles.

Our first step is to get our virtual servers to believe that they are in defined subnets even though they are in fact in a flat overlay network. Underneath they can exist in the same VPC subnet or across various VPC/VNETs. This approach allows you to run a cluster across cloud providers and across regions. You can even add members in a data center, just so long as all Windows servers are connected to the same overlay network.

So lets say we have an overlay network of 172.16.8.0/24 and we have carved out three /29s:

172.16.8.56/29
172.16.8.64/29
172.16.8.72/29

The above should be done when you import your license, where you can choose custom and define your overlay addresses.

Let’s also assume we have a pair of Active Directory domain controllers assigned 172.16.0.1 and 172.16.0.2.

On each of the cluster windows servers we assign the following VNS3 client packs:

172.16.8.57
172.16.8.65
172.16.8.73
We then add the following directives to each client pack:
dhcp-option DNS 172.16.8.1
dhcp-option DNS 172.16.8.2
script-security 2
route-up "C:\\routes.bat"
We would suggest that you use the wintun driver for which you would also add:
windows-driver wintun

Further details can be found here:
https://cohesiveprod.wpenginepowered.com/blog/the-new-openvpn-2_5/

Make sure to install the Cohesive Networks Routing Agent which can be found here:
https://cn-dnld.s3.amazonaws.com/cohesive-ra-1.1.1_x86_64.exe

In Windows Network Connections change the name of your Wintun Userspace Tunnel to ‘Overlay’ so it can be referenced as below.

We will create C:\routes.bat as follows:

@echo off
set cidr=172.16.8
set /A a=56
set /A b=%a% + 1
set /A c=%b% + 1

netsh interface ipv4 set address "Overlay" static %cidr%.%b% 255.255.255.248
timeout 1
netsh interface ipv4 add route %cidr%.0/24 "Overlay"timeout 1
C:\Windows\SysWOW64\WindowsPowerShell\v1.0\powershell.exe -command "& {Get-NetAdapter -Name Overlay | select -ExpandProperty ifIndex -first 1 }" > tmpFile
set /p INDEX= < tmpFile
del tmpFile
timeout 1
C:\Windows\SysWOW64\WindowsPowerShell\v1.0\powershell.exe -command New-NetIPAddress -InterfaceIndex %INDEX% -AddressFamily IPv4 -IPAddress %cidr%.%c% -PrefixLength 29 -DefaultGateway %cidr%.%a%
@echo off
Change the set cidr= value as needed and change set /A a= to the low address in the /29 for the windows server.

Now when you bring up OpenVPN you will see:

V

At this point we’ve accomplished two very important tasks. First, we have changed the default subnet for our assigned overlay address from a /24 to a /29, which conforms to WSFC’s configuration rule necessitating that all clustered servers be in their own subnet. But since we’re using openVPN here we have an issue: we do not have a default gateway and WSFC insists that we have one. So we can add the next address in our /29 to our virtual interface and set a default gateway for it, in this case the first address in our /29. This address is really just to trick WSFC; while the address will answer traffic its purpose is to simply supply an address in our /29 that has a default gateway set.

Our next step is to set up interface routes in VNS3 so that all traffic within the /29 subnets can reach the actual assigned client pack overlay addresses. In the Routes section of VNS3 you will add an interface route for each /29 as follows:

VNS3 Transit Ipsec deployment

You now have a route that directs anything in the /29 address space ‘through’ the assigned client pack. So in the example of /56, where 172.16.8.57 is the client pack, and we have added 172.16.8.58 to the virtual Wintun interface, we will tell WSFC to add 172.16.8.58 for our cluster address. Our windows server will now respond to all three addresses.

Let’s try it out:

You will need to install the Failover Clustering feature, and you will need to have all windows servers joined to a domain.

In our example I have named my three servers Cluster-1, Cluster-2 and Cluster-3. From any of these three servers, open up a command window and enter powershell mode:

New-Cluster -Name OverlayCluster1 -Node Cluster-1,Cluster-2,Cluster-3 -NoStorage -StaticAddress 172.16.8.59,172.16.8.67,172.16.8.75 -AdministrativeAccessPoint ActiveDirectoryAndDns

You should see some activity metering and then a success message.

Now you can connect to a cluster in the Failover Cluster Manager and enter a period to denote this server. With clustername.domain selected under “Cluster Core Resources” expand “Name: Clustername.” Right click on the underlay address that should be labeled “Failed” and select “Remove.” Under “Nodes” you should see your three servers. Under “Networks” you will see each of the overlay /29s as well as each of the underlay subnets. For the underlay subnets, under properties, select “Do not allow cluster network communication on this network.” Your cluster is now ready to add roles. Try it out.

VNS3 AWS Overlay deployment
When you are done you can destroy the cluster like this.
Get-Cluster -Name OCluster1 | Remove-Cluster -Force -CleanupAD