I'm currently planning the upgrade of our DMS, which currently has 6 pairs of failover DMAs.
To reduce downtime, I'm planning on using the Upgrade backup Agent first, switch over and upgrade main method.
With this option, what is the expected behavior surrounding multiple DMAs? Does a DMA failover as soon as the upgrade has been completed on that one DMA, or do all DMAs wait until they have all upgraded before all of them failing over at the same time?
Are there any additional option to upgrade the backup DMA and then require some kind of manual intervention to say that we are okay to proceed with the failover, rather than just automatically failing over once the backup DMA has upgraded?
Hi Dave,
The "Upgrade backup Agent first, switch over and upgrade main" option applies to each individual Failover pair. There is no synchronization with other Failover pairs in the cluster. As soon as a backup agent has completed its upgrade, it will start switching over and start the upgrade on the main agent, regardless of other backup agents still working on their upgrade.
To my knowledge there is no option available to require manual intervention before starting the main agent upgrade after the backup agent upgrade completed.
Do note that these "Upgrade backup Agent first" options are discouraged for major upgrades. (Upgrading a DataMiner Agent in System Center)
I believe the main reasons are:
– Especially for major upgrades that have changes to inter-DMA communication, one will (at least temporary) end up with both main and backup agents running different versions. Possibly showing undefined behavior. Worst case, after the backup agent upgrade the switchover might not work due to the version differences, halting the rest of the upgrade and requiring manual intervention.
– In general, the mechanics for orchestrating the upgrade backup first / switch over / upgrade main (/ switch back) have more room for error
Thanks for you answer Wouter.
It would be good to know the context for discouraging “Upgrade backup agent first”, as it is going to be quite hard convincing our internal and external stakeholders that expected downtime of our DMS for an upgrade would be around 20-30 minutes vs normal failover time.