Hi,
When upgrading the OS on VMs hosting a DataMiner Cluster, one strategy can be to "lift and shift" the platform to a completely new VM cluster.
When doing this, what are the recommended actions to ensure that the MTTR is minimized?
Also, is this strategy recommended? What are the drawbacks in its adoption?
Hi Bruno,
Sharing what's been my recent experience (though not in the context of a full cluster migration, just a few DMAs migrated to newer VMs) as I'm keen to hear from the forum on this one.
Trimming unnecessary data before the date of the operation is a must.
Not worth keeping 3 years of trend or alarm data if all you need is the last 3 months.
Where possible, I'd advise to keep the new VMs in the same subnet of the legacy ones - ideally, I'd find a way to even keep the same addresses of the data acquisition interface, or at least the related VIP (this facilitates settings where devices are sending traps to the DMAs).
If that is not an option, testing the reachability from the new environment can be key (our deploy team provided a dedicated staging DMA). In this case, all the devices that use traps may have to be reconfigured.
If the two network environments are very different (or the gap in the OS versions is big) it could be way easier to get temp licenses for the new environment - this could involve having (at least temporarily) new DMA IDs, so that you can stop the old DMS and activate the new one from the newer VMs. When the operation is completed, the older DMA IDs could be decommissioned.
In this case the time to switch to the new DMS is going to be just the time needed to load the elements in the new environment. And a roll-back (if any) would consist just in stopping the new DMS and re-activating the old one.
DELT export of the elements can help to move the element from an old DMS to a new one - or even creating the elements from a CSV export to rearrange the capacity allocation (e.g. to redistribute the load across DMAs) - this can work if no history from the previous DB is needed and it's viable to start from scratch with brand new elements.
Where the DMA has a 1+1 redundancy, you could choose to migrate the stby agent, let it sync between main & stby, failover to the main - this is not my favourite option as a few files are not automatically synch-ed across main & back-up (e.g. the email config in DataMiner.xml and other bits) - moreover, with this approach, you may end up in inverting main & stby in the failover GUI (not a major deal, but one of the reasons why I'd prefer to build a new DMS on the new environment and migrate elements, rather than migrating each single server to its new VM).
My considerations can be biased by what would work best in my environment, so I'm curious to hear more from other admins - one thing that can help is to check M&S agreements and the general expectations with the local admins on the customer's side.
E.g. let's say that at the end of the maintenance one server has no stby-agent due to a faulty condition or to an ongoing sync process: would this be a problem in that environment?
If yes, temp licenses can help, so that the actual migration is performed only when the new system meets the "definition of ready" in that specific context.
HTH,
A.
Glad to be of help, Bruno
Thanks Alberto, those are important considerations indeed. I would add licensing as an additional item to have into account, as we would need to regenerate 53 licenses at once.
The reason behind my question is to highlight that the pitfalls of adopting this strategy largely outnumber the potential benefits.
I was hoping to hear some ideas advocating this strategy, to be able to properly advise in this particular context and have a solid comparison playing field.
Again, grateful for your inputs.