Hi,
We have 2 agents, 6001 and 6002 in a Failover pair where 6001 is active. DMA version = 9.6 CU23. The monthly patching for Windows was done on 6002. After that was done and 6002's server finished rebooting, we had a scenario where both 6001 and 6002 had same Virtual IP. Both agents displayed the message 'connection timeout - the DataMiner agent did not respond in time. The server might be rebooting or the DataMiner software might be restarting'.
Is there a log file or anything we can check how and why this happens ?
PS - 6001's SLDMS logs indicate it cannot connect to 6002 after the reboot button was clicked on 6002. Other than that I cant find any log to explain this behavior 🙁
Hi Arun
I won't be able to give you an answer here, but perhaps some context that might help you.
In short, I'd say, yes the windows updates probably could affect the VIP. But if it's the case here, I'm not sure..
Now the longer version.
In the past, we had an issue when switching the failover, that the online DMA had sent the action to release the Virtual IP, but Windows never did it.
The offline agent was then stuck at "Going Online (Acquiring Virtual IP)" (log file: FailoverStatus.txt), but since the VIP never was release at the online DMA, it never could acquire the VIP.
However I don't remember if this was related to any new Windows updates.
In general when investigating failover issues, I'd start digging into the log files: FailoverStatus.txt, SLNet.txt and SLFailover.txt.
Next to that, since both agents came online, you might want to double check the heartbeat configuration. Remember that the inverted heartbeats should preferably point towards f.e. a router. (DM Help)
Hi Robin, thank you for the information. Checking failoverstatus log file, the main agent is online and backup agent is ‘going online’. And this is probably the reason why both agents took the VIP.
But I am bemused as to why there was a connection timeout on both DMAs. During that time, we were not able to ping one DMA from the other. But after stopping DataMiner on both and starting them one at a time, the pings worked.