a {
text-decoration: none;
color: #464feb;
}
tr th, tr td {
border: 1px solid #e6e6e6;
}
tr th {
background-color: #f5f5f5;
}
A user is facing operational challenges with a large-scale DataMiner deployment running on Windows. Due to strict security policies, Windows patches are applied regularly, resulting in frequent system reboots. In this environment, the average boot time for the DataMiner System (with 20+ DataMiner Agents and a large number of devices and enhanced services) often exceeds 10 minutes. This extended downtime is problematic for NOC operations, especially when tight SLAs are in place or when critical service/maintenance activities are ongoing and DataMiner insights are needed.
Given that the user's current system architecture was designed some time ago, are there any recent DataMiner features, best practices, or architectural recommendations that can help minimize downtime and improve system availability during or after Windows updates in such large-scale deployments?
If I understand correctly, here the issue is more tied to the OS layer - it would be the same for any application running on top of the OS if the environment has single nodes (essentially SPOFs) that essentially create frequent application outages just due to the need of frequent OS patching.
In my personal view, whenever uptime is a must-have, these OS nodes should have some form of architecture resilience.
DataMiner allows failover licenses for on-prem and also more advanced capabilities (like swarming that Pieter has already mentioned) when running in the cloud - I'd recommend reviewing the hosting layer to embed any of the capabilities that DataMiner already provides and assess if a transformation is viable.
HTH