July 31st, 2008 Posted in Knowledge Base | No Comments »
Allow me to set the stage…
You have a few big bore terminal servers, handling about 400 users per day. All 3 of those server are configured to unicast spec with the best practices of MS. All 3 of those servers also sit on the same switch in your network infrastructure. You decide to move a couple of the nodes to another datacentre (on your giant flat network, in the same subnet) and all of a sudden, a portion of your clients are getting errors and are unable to connect to the cluster name. Connecting the the dedicated IP or backend adapter works just fine.
I love the words “Unicast mode works with all routers and switches” because it puts this massive false sense of security in your head.
Allow me to correct that statement… “Unicast DOES NOT mean your NLB cluster will work with your infrastructure” and here is the example from my environment.
So we now have the following configuration:
RDPCLUSTER (192.168.0.201) - Unicast Mode - MAC 02:BF:xx:xx:xx:xx
| NODE NAME |
DED IP ADDRESS |
SWITCH |
PORT |
OUTBOUND MAC |
| NODE1 |
192.168.0.198 |
SWITCH1 |
1 |
02:01:xx:xx:xx:xx |
| NODE2 |
192.168.0.199 |
SWITCH2 |
1 |
02:02:xx:xx:xx:xx |
| NODE3 |
192.168.0.200 |
SWITCH2 |
2 |
02:03:xx:xx:xx:xx |
All of the nodes are on the same subnet and VLAN. The clients accessing the nodes are coming from different VLANS.
So in order to prevent the switches from learning the MAC of the cluster, the nodes send outbound packets with their custom MAC address. When you get on the edge switches and look at ARP table, the address does not exist. This is a good thing. The switch actually learns the custom MAC for each node.
However, when a client makes it’s way in and wants to access the gateway it sends the ARP to locate the MAC of RDPCLUSTER, and it hits a router. The router has no flippin clue where it is and then floods out an ARP. The problem is, each of the nodes in the cluster gets this and the 3 replies comes back from the nodes. The router then caches the location of RDPCLUSTER. This is a BAD thing. Here is an example:
So the ARP hits all the nodes and NODE2 fires back with the reply. The router then caches the ARP saying that RDPCLUSTER is on SWITCH2. So now, another new client comes in and does the same thing, looking for RDPCLUSTER. This time, NODE1 gets the duty of responding. However, the router wants to toss all the packets for RDPCLUSTER at SWITCH2, not SWITCH1 because of the intial cache. You can see this by creating a network trace and watching the TCP SYN packets leaving the clients, but no TCP SYN/ACK coming back from the cluster, because it never got to NODE1. And since it is only when connections get load balanced to NODE1, you won’t always see the problem, it will only pop it’s head up sporadically.
If you are using the “Router on a Stick” topology, then this is never a problem because all of your hosts are theoretically in the same “location”, so the router can cache it’s little heart away.
This also has alot to do with your actual network equipment and topology, and in my instance, we use Foundry hardware. Foundry equipment caches the PHYSICAL port number in the router, not the Virtual Interface, which is why people with Cisco equipment won’t see this behavior. This basically prevents us from putting NLB servers on Geographically separate physical segments, because we use a mesh/web routing design for high availablity. Maybe this can help someone else, and maybe not.