Skip to content
DataMiner DoJo

More results...

Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Search in posts
Search in pages
Log in
Menu
  • Updates & Insights
  • Questions
  • Learning
    • E-learning Courses
    • Empower Replay: Limited Edition
    • Tutorials
    • Open Classroom Training
    • Certification
      • DataMiner Fundamentals
      • DataMiner Configurator
      • DataMiner Automation
      • Scripts & Connectors Developer: HTTP Basics
      • Scripts & Connectors Developer: SNMP Basics
      • Visual Overview – Level 1
      • Verify a certificate
    • Video Library
    • Books We Like
    • >> Go to DataMiner Docs
  • Expert Center
    • Solutions & Use Cases
      • Solutions
      • Use Case Library
    • Markets & Industries
      • Media production
      • Government & defense
      • Content distribution
      • Service providers
      • Partners
      • OSS/BSS
    • Agile
      • Agile Webspace
      • Everything Agile
        • The Agile Manifesto
        • Best Practices
        • Retro Recipes
      • Methodologies
        • The Scrum Framework
        • Kanban
        • Extreme Programming
      • Roles
        • The Product Owner
        • The Agile Coach
        • The Quality & UX Coach (QX)
    • DataMiner DevOps Professional Program
      • About the DevOps Program
      • DataMiner DevOps Support
  • Downloads
  • More
    • DataMiner Releases & Updates
    • Feature Suggestions
    • Climb the leaderboard!
    • Swag Shop
    • Contact
    • Global Feedback Survey
  • PARTNERS
    • All Partners
    • Technology Partners
    • Strategic Partner Program
    • Deal Registration
  • >> Go to dataminer.services

Failover – Inverted heartbeats (Local switch disconnected from active and offline agent)

Solved1.39K views19th July 2023Failover
1
Miguel Obregon [SLC] [DevOps Catalyst]18.92K 25th March 2021 0 Comments

Hi Dojo,

We configured a failover setup following the best practices described in the DataMiner Help [Link]:

We tested the following scenario:
Active and offline agents lose connectivity with a network switch that is used to configure the inverted heartbeats. Acquisition and Corporate networks are still available from active and offline agent.

When the network cable connecting to the switch is pulled out from the active and offline DMA, we noticed the following:

  • In the Alarm Console, a notice will appear, indicating that the heartbeat path is failing
  • The active agent goes offline
  • The offline agent stays offline

After a couple of minutes, the active agent becomes online again.

Checking the FAQ related to failover setups [Link], we could not find this use case. Please could you let us know if the events described above are expected?

DMA version: 10.1.2.0-9866

Marieke Goethals [SLC] [DevOps Catalyst] Selected answer as best 19th July 2023

2 Answers

  • Active
  • Voted
  • Newest
  • Oldest
1
Wouter Demuynck [SLC] [DevOps Advocate]5.93K Posted 26th March 2021 1 Comment

Hey Miguel,

Scenario as far as I understood: only inverted heartbeats are failing (because of the switch not longer being reachable), and they are failing on both machines, other heartbeats are still successful.

It is expected for the active agent to go offline (because of the inverted heartbeat failure)

It is also expected that the offline agent stays offline (because of the inverted heartbeat)

The active agent going back online is somewhat unexpected, but can most likely be explained by the logic where the normal heartbeats can bring online the agent which was online most recently if it was detected that both agents are offline at the same time. (“Going online because partner XXXX if offline”). That’s also what happens when restarting Failover agents (both start up as offline initially). I actually expect the agent to start toggling between online and offline in this case (the inverted heartbeat will bring the agent offline again later on).

In any case, the exact flow for the actions can be observed in the SLFailover.txt logfiles of both agents (in earlier versions: SLNet.txt)

A case could be made against bringing that agent back online. It can be argued that this action should not be taken while the inverted heartbeats are failing.

Marieke Goethals [SLC] [DevOps Catalyst] Selected answer as best 19th July 2023
Miguel Obregon [SLC] [DevOps Catalyst] commented 26th March 2021

Hi Wouter,
Thanks for the detailed explanation.

Indeed, the scenario that you are describing is what I described in my initial question.

I understand that this is expected behavior. As long as both DMAs are not able to reach the network switch (using inverted heartbeats), both DMAs will stay offline.

I also understand that it is not expected to see a DMA (of the failover pair) become online after the connection with network switch is restored. Does it mean that, in an expected scenario, we will need to set manually an agent to become online?

I will create a task reporting the unexpected scenario.

1
Ive Herreman [SLC] [DevOps Enabler]13.59K Posted 26th March 2021 1 Comment

Hi Miguel,

My 2 cents, I would define just one inverted heartbeat.

Miguel Obregon [SLC] [DevOps Catalyst] Posted new comment 26th March 2021
Miguel Obregon [SLC] [DevOps Catalyst] commented 26th March 2021

Hi Ive,
Thanks! I will test this configuration and let you know the outcome

Please login to be able to comment or post an answer.

My DevOps rank

DevOps Members get more insights on their profile page.

My user earnings

0 Dojo credits

Spend your credits in our swag shop.

0 Reputation points

Boost your reputation, climb the leaderboard.

Promo banner DataMiner DevOps Professiona Program
DataMiner Integration Studio (DIS)
Empower Katas
Privacy Policy • Terms & Conditions • Contact

© 2025 Skyline Communications. All rights reserved.

DOJO Q&A widget

Can't find what you need?

? Explore the Q&A DataMiner Docs