I mainly want to know what I can do & provide Skyline to assist / help investigate problems, unexpected behaviors in a failover setup.
The following files contain information related to the failover status:
-SLFailover.txt: this log file will show you what a current DMA believes it should do related to failover, come online, take the VIP, etc
-FailoverStatus.txt: this will only contain the current status of the failover, this can be 3 options: Offline, Online or Not active
not really that useful since the same information is available in SLFailover.txt but might be used in cases to read out the state in an easy straight forward way (but we might change/remove this log file in the future)
-SLNet.txt: likely the most useful log file when issues occur, this contains the SLNet checks to know the status of the other agent, search for words as "Failover", "DRS" (old naming convention for DataMiner Redundancy System) or IP of the other agent
-DMS.xml: there isn't any other configuration file involved except this one, double confirm if the settings are OK
Important note: never remove a failover for debugging purposes or to try to fix something, very likely this will break things even further or make things corrupt. At that point provide the SLLogCollector packages from both nodes to our techsupport team to have a proper action plan in place to resolve the situation
I would check the failover status window from System Center --> Agents --> Failover and then continue the investigation. Usually, the log file SLFailover.txt (in C:\Skyline DataMiner folder) contains all the info related to failovers such as when a failover occurred etc. There is another file called FailoverStatus.txt in the same folder that shows the current state of the DMA in failover (whether it is going online/offline etc).