Azure Load Balancer (ILB) with HealthShare
InterSystems Developer
Posted on August 29, 2023
Overview
We often run into connectivity problems with HealthShare (HS) deployments in Microsoft Azure that have multiple HealthShare components (instances or namespaces) installed on the same VM, especially when needing to communicate to other HS components while using the Azure Load Balancer (ILB) to provide mirror VIP functionality. Details on how and why a load balancer is used with database mirroring can be found this community article.
As per Azure Load Balancer documentation, https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-overview#limitations, the Azure Internal Load Balancer default behavior is as follows:
..if an outbound flow from a VM in the backend pool attempts a flow to frontend of the internal Load Balancer in which pool it resides and is mapped back to itself, both legs of the flow don't match and the flow will fail.
So, in a HealthShare deployment that has multiple instances or namespaces on a given VM in the ILB server pool that need to communicate with each other will fail. For example, here's a scenario with HealthShare Unified Care Record (UCR) Registry primary mirror instance and HealthShare Patient Index primary mirror instance both on the same Azure VM.
In the above example, the UCR Registry initiates a connection to 10.0.1.100 so it can communicate with the Patient Index instance there is a 50% chance this connection will fail depending on whether the primary mirror members for each Patient Index and UCR Registry are on the same host (10.0.1.10 in this case).
This connection fails because the default NAT behavior of Azure ILB does not perform inbound Source-NAT (SNAT) and the original source IP is preserved. Details are available in the same Microsoft documentation link above:
When the flow maps back to itself the outbound flow appears to originate from the VM to the frontend and the corresponding inbound flow appears to originate from the VM to itself...
Specifically the default behavior of the Azure ILB is as follows:
- The Azure ILB does not perform inbound Source-NAT and so the original source IP is preserved
- When using the default load balancer rule set of the DSR (aka "floating IP") disable, it does perform Destination-NAT (DNAT)
Which results in the following, again from the original documentation link above:
From the guest OS's point of view, the inbound and outbound parts of the same flow don't match inside the virtual machine. The TCP stack will not recognize these halves of the same flow as being part of the same flow as the source and destination don't match.
Workaround
There are several options available to work around this Azure ILB issue, however this article will focus on just one approach.
Adding a Second NIC
There are only two steps required to accomplish this as follows:
- Add a second NIC to the VM with a different IP address (in the following diagram a NIC and address of .11 was added)
- Configure local (OS level) static route forcing traffic destined to the ILB VIP (10.0.1.100) out of the secondary NIC
This allows the communication to be successful now that the backend-to-frontend now has a different source (10.0.1.11) and destination IP address (10.0.1.100 > DNAT > 10.0.1.10).
// show multiple NIC
c:\pstools>ipconfig | findstr /i "ipv4"
IPv4 Address. . . . . . . . . . . : 10.1.0.10
IPv4 Address. . . . . . . . . . . : 10.1.0.11
// static route traffic destined to LB front end out of second NIC
c:\pstools>route add 10.1.0.100 mask 255.255.255.255 10.1.0.1 if 18 ('if 18' indicates the interface to use for the outbound traffic)
OK!
**Note - the syntax of your exact "route add" command will vary depending on your exact network and subnet topology. This example is provided only for illustration purposes. Documentation on Windows Route command can be found here and Red Hat Enterprise Linux can be found here.
Posted on August 29, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.