NetScaler ADC – HA Failover issues with Cisco ACI

Reading Time: 2 minutes

In a current customer project my job is to redesign and migrate an existing Citrix ADC environment. The responsible network administrator told me that they are having issues with the HA failover functionality since a long time. The symptoms have been looking like this:

  • HA Failover is initiated on NS#1
  • NS#2 becomes the active node
  • 3-15 minutes later the NetScaler Gateways VIPs and Load Balancing VIPs become unreachable
  • A reboot of the NS#2 brings back NS#1 as primary node and the VIPs are accessible again

To further investigate the problem and find the root cause of this issue we need to understand whats going on when a HA failover is happening. As soon NS#2 takes over two GARP (Gratuitous ARP) packets are send out on all connected ethernet interfaces. This needs to be done because the connected network devices (switches,routers) needs to update their ARP table with the “new” mac addresses of NS#2. Without a succesfull GARP the ARP tables are not getting updated and still point to the old mac addreses of NS#1.

So what do you need to take care of when deploying a NetScaler on a Cisco ACI infrastructure? There are two parameters which needs to be configured to solve the issue.

1.) GARP-based EP Move Detection Mode

Enable the “GARP based detection” feature under “L3 Configuration” on the related bridge domain.

garp
Cisco KB – GARP-based EP Move Detection Mode

2.) Endpoint Dataplane Learning

There is a relative new Citrix KB article, which is recommending to disable the endpoint dataplane learning feature when using Cisco ACI. This can be found under “General” on the bridge domain. If you are not doing this the network traffic still could be delivered to the secondary node after a failover.

https://support.citrix.com/article/CTX238900

edlCisco KB – Endpoint Dataplane Learning

After applying this two changes the NetScaler failover feature is working flawless and the Load Balancing & NetScaler Gateway vServers are still accessible in case of an outage of the primary node.

Happy Failover 🙂

One comment

Leave a Reply

Your email address will not be published. Required fields are marked *