When performing a restart or failover of our DataMiner system, one of the NICs becomes unusable.
Even sending out a ping over that NIC stops working.
Disabling & Enabling the NIC again solves the issue (until the next restart).
Does someone know the exact set of actions that are performed on the NIC during startup/failover?
Hi Ive, When a failover agent goes online, the only operation it does on the NIC level is adding the virtual IP, which is done using windows-level calls. Afterwards an ARP request is sent on the network to notify the network of the new IP. Once the virtual IP is acquired, we will add the SkipAsSource-flag to the primary IP of that NIC. This is done so the server will always use the Virtual IP when communicating with other devices on the network.
The main thing here that can break the NIC would be an IP conflict on the network, specifically when we try to claim an ip which has already been assigned. This can be easily checked in:
- SLFailover.txt: Added Virtual IP. Context = # / Instance = # (context will be a negative value, and Instance will be 0)
- Event viewer: event viewer will log this, usually with ID 4199
Hi Bert, indeed thank you for reminding.
The skipAsSource flag is indeed added to the NIC. With this flag enabled, the Primary IP will be skipped when communicating with other devices on the network, this way the DMA will always respond using the Virtual IP
I added this to the answer
Thanks Brent, it seems that we are dealing with a duplicate IP on the network.
Hi Brent, isn’t there also something being done with the SkipAsSource flag on the IP addresses?