a {
text-decoration: none;
color: #464feb;
}
tr th, tr td {
border: 1px solid #e6e6e6;
}
tr th {
background-color: #f5f5f5;
}
A user is facing operational challenges with a large-scale DataMiner deployment running on Windows. Due to strict security policies, Windows patches are applied regularly, resulting in frequent system reboots. In this environment, the average boot time for the DataMiner System (with 20+ DataMiner Agents and a large number of devices and enhanced services) often exceeds 10 minutes. This extended downtime is problematic for NOC operations, especially when tight SLAs are in place or when critical service/maintenance activities are ongoing and DataMiner insights are needed.
Given that the user's current system architecture was designed some time ago, are there any recent DataMiner features, best practices, or architectural recommendations that can help minimize downtime and improve system availability during or after Windows updates in such large-scale deployments?
DataMiner is gradually evolving towards a highly available architecture, with features such as Swarming and Rolling Updates helping to reduce downtime during maintenance activities.
Swarming allows host-specific functionality to be moved from one DataMiner Agent to another within a cluster. A good example is a DataMiner Element. At any given moment, an Element runs on a single Agent. Without Swarming, taking that Agent offline for Windows updates would also make the Element unavailable for that period.
With Swarming, Elements can be moved to other Agents before maintenance starts. This makes it possible to temporarily empty an Agent and perform Windows updates, reboots, or other maintenance tasks without affecting those Elements. Most Element types can already be swarmed today, although we're still extending the feature to support some of the more specialized scenarios, such as certain advanced Element types and services.
We're also working on Rolling Updates, which applies a similar concept to DataMiner software upgrades. The goal is to allow Agents in a cluster to be upgraded one by one while workloads continue running elsewhere in the cluster. This is a more complex challenge, as it requires different DataMiner software versions to coexist temporarily within the same cluster, but it is an important step in our ongoing work towards higher availability.
To better understand the requirements, it would be useful to know which DataMiner version is currently being used and whether DataMiner Services are part of the setup. It would also help to understand whether those services are expected to remain available during maintenance activities, or whether a short interruption is acceptable today.