Skip to content
DataMiner DoJo

More results...

Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Search in posts
Search in pages
Log in
Menu
  • Updates & Insights
  • Questions
  • Learning
    • E-learning Courses
    • Empower Replay: Limited Edition
    • Tutorials
    • Open Classroom Training
    • Certification
      • DataMiner Fundamentals
      • DataMiner Configurator
      • DataMiner Automation
      • Scripts & Connectors Developer: HTTP Basics
      • Scripts & Connectors Developer: SNMP Basics
      • Visual Overview – Level 1
      • Verify a certificate
    • Video Library
    • Books We Like
    • >> Go to DataMiner Docs
  • Expert Center
    • Solutions & Use Cases
      • Solutions
      • Use Case Library
    • Markets & Industries
      • Media production
      • Government & defense
      • Content distribution
      • Service providers
      • Partners
      • OSS/BSS
    • Agile
      • Agile Webspace
      • Everything Agile
        • The Agile Manifesto
        • Best Practices
        • Retro Recipes
      • Methodologies
        • The Scrum Framework
        • Kanban
        • Extreme Programming
      • Roles
        • The Product Owner
        • The Agile Coach
        • The Quality & UX Coach (QX)
    • DataMiner DevOps Professional Program
      • About the DevOps Program
      • DataMiner DevOps Support
  • Downloads
  • More
    • DataMiner Releases & Updates
    • Feature Suggestions
    • Climb the leaderboard!
    • Swag Shop
    • Contact
    • Global Feedback Survey
  • PARTNERS
    • All Partners
    • Technology Partners
    • Strategic Partner Program
    • Deal Registration
  • >> Go to dataminer.services

Safely decommission a stopped DataMiner agent that is part of a failover pair

Solved964 views25th March 2022cassandra cluster DMS elastic cluster Failover
1
Ciprian Moga [SLC] [DevOps Member]318 24th March 2022 0 Comments

Hi,

As a user I want to reset an agent (part of a failover pair) to factory settings without impact on any external database technologies (ElasticSearch, Cassandra clusters) that the agent might be configured to use so that there is no impact on the operation of the DMS.

Today, the factory reset of an agent can be done with the help of the SLReset tool.

Let’s consider the following example DMS: 1+1 (failover/high availability setup (HA)) on site A and another 1+1 on site B (used as a disaster recovery (DR) setup for site A); All 4 agents share the same DMAID and only one site is active at the same time (to prevent double-polling of the same devices – since both sites are connected to the same devices). Each agent has its own (locally hosted) Cassandra node but configured as a 2 node cluster (failover). Each site has a dedicated (externally hosted) ElasticSearch cluster (2 separate clusters, in total). Each site is connected to both ElasticSearch clusters and is writing the same data to both, this way the same ES data is available on both sites, in case of a disaster.

Now, let’s assume we want to decommission one of the failover nodes in one of the sites, let’s say site B. Now, both machines in site B are offline (to prevent the double-polling explained above).
Can the SLReset tool be safely used to perform this action without any prior file manipulation (db.xml/dbmaintenancedms.xml, etc) to decommission the node with 0 impact on the data that is being stored on the (ES/Cassandra FO) clusters? How will the FO pair of the decommissioned node be notified of this forced failover disable/offline decommission, given that both of agents are stopped prior to the execution of the SLReset on the node needing to be decommissioned?

Thanks in advance,

Ciprian Moga [SLC] [DevOps Member] Selected answer as best 25th March 2022

1 Answer

  • Active
  • Voted
  • Newest
  • Oldest
2
Brent [SLC]1.57K Posted 24th March 2022 0 Comments

Hey Ciprian,

With “the machines are offline” I will assume that you mean that the entire dataminer including SLNet is stopped.

First of all, SLReset is a factory reset tool, not a decommission tool. It is part of the process when a failover configuration is deleted, that is true, but it is not a direct replacement.

Running SLReset on a stopped node will not notify the other agent in the pair of its changes, effectively leaving it clueless about the whereabouts of its partner. So the safest way to decommission a failover pair for stopped nodes is to run SLReset on both nodes, and if needed restore a DMA backup on one agent.

Your second question “to decommission the node with 0 impact on the data that is being stored” is mutually exclusive with the concept of a factory reset. By default the tool will make a distinction between 2 types of storage. “Local/Failover” and clustered.

Clustered databases like ElasticSearch and CassandraCluster contain the data for the entire DMS and thus cannot reasonably be deleted, the SLReset will skip cleaning these databases. So the data should not be deleted. (However given your specific setup and use case where 1 ES cluster is used for 2 DMA systems, it would be safe to remove the elasticsearch tag from the db.xml before running SLReset).

Local/failover storage however will be deleted, it will remove all the local files for that database.In case of Cassandra it will also reset the cassandra.yaml  to the defaults. Again, no communication will happen to the other agent, so it is possible that nodetool status still reports 2 nodes, one up and one down. Since [ID_29894] we will also make sure the reset node is completely removed from the setup.

Ciprian Moga [SLC] [DevOps Member] Selected answer as best 25th March 2022
Please login to be able to comment or post an answer.

My DevOps rank

DevOps Members get more insights on their profile page.

My user earnings

0 Dojo credits

Spend your credits in our swag shop.

0 Reputation points

Boost your reputation, climb the leaderboard.

Promo banner DataMiner DevOps Professiona Program
DataMiner Integration Studio (DIS)
Empower Katas
Privacy Policy • Terms & Conditions • Contact

© 2025 Skyline Communications. All rights reserved.

DOJO Q&A widget

Can't find what you need?

? Explore the Q&A DataMiner Docs