- What actions occur in the background while a primary DMA is going offline and the backup DMA is going online?
- Does the number of elements running or the amount of data stored in the Database affect the failover switching time?
Hi Bruno,
When a Failover switch is triggered, it will go through 4 steps:
- Stop the DataMiner software
- Release the VIP IP (or the first steps are covered by the server going offline abruptly)
- The other DMA can take over the VIP
- The DMA can perform a DataMiner start.
To know where your system is losing most of its time, you will need to go through logging. It's easiest to look for the reassignment of the VIPs, which can be checked in the SLFailover logging on both main and Failover. This will already tell you when these steps occurred and how long they took. If you lose most of the time after reassigning the VIPs, it is best to look into the SLDataMiner logging file. There you will find how long it took to offload data (if any) to the DB, evaluate all the protocols in your system, start all the elements and services. An easy way to speed up your startup time is by removing unused protocol versions in your system.
Note: Logging can be found on the server itself in the folder C:\Skyline DataMiner\logging.
Hi Craig,
Indeed the elements will only start after the VIP has been assigned.
1.
The online DMA (primary DMA) will stop polling the elements, and the offline DMA (backup DMA) will begin polling of the elements. Also the virtual IP will be removed from the primary DMA and activated on the backup DMA.
Potentially there are other actions happening in the background, but the ones I mentioned are the most noticeable ones.
2. Yes, the number of elements largely defines how long the switch over will take, since the backup DMA needs to start up these elements on it's server.
So from your steps above am I correct in saying the VIP becomes active before the elements are started? That’s fine – just wanted to clarify. Thanks