Skip to content
DataMiner DoJo

More results...

Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Search in posts
Search in pages
Log in
Menu
  • Blog
  • Questions
  • Learning
    • E-learning Courses
    • Open Classroom Training
    • Certification
      • DataMiner Fundamentals
      • DataMiner Configurator
      • DataMiner Automation
      • Scripts & Connectors Developer: HTTP Basics
      • Scripts & Connectors Developer: SNMP Basics
      • Visual Overview – Level 1
      • Verify a certificate
    • Tutorials
    • Video Library
    • Books We Like
    • >> Go to DataMiner Docs
  • Expert Center
    • Solutions & Use Cases
      • Solutions
      • Use Case Library
    • Markets & Industries
      • Media production
      • Government & defense
      • Content distribution
      • Service providers
      • Partners
      • OSS/BSS
    • DataMiner Insights
      • Security
      • Integration Studio
      • System Architecture
      • DataMiner Releases & Updates
      • DataMiner Apps
    • Agile
      • Agile Webspace
      • Everything Agile
        • The Agile Manifesto
        • Best Practices
        • Retro Recipes
      • Methodologies
        • The Scrum Framework
        • Kanban
        • Extreme Programming
      • Roles
        • The Product Owner
        • The Agile Coach
        • The Quality & UX Coach (QX)
    • DataMiner DevOps Professional Program
  • Downloads
  • More
    • Feature Suggestions
    • Climb the leaderboard!
    • Swag Shop
    • Contact
      • General Inquiries
      • DataMiner DevOps Support
      • Commercial Requests
    • Global Feedback Survey
  • PARTNERS
    • All Partners
    • Technology Partners
    • Strategic Partner Program
    • Deal Registration
  • >> Go to dataminer.services

Not all cluster agents starting after upgrade to 10.3.7.0

Solved838 views11th July 2023cluster dma upgrade
1
Avatar photo
Jan Staelens [SLC] [DevOps Advocate]889 10th July 2023 2 Comments

I have a DMS with 2 agents. Each is on a different physical server.

It has no elements, scripts, views, ... it's empty.

Upgraded

from 10.2.0.0-11897-20220611-release
to 10.3.7.0-13107-20230614-release  (latest feature as of this post)

I first attempted the upgrade by selecting 'cluster'. It indicated it had successfully upgraded both agents in the cluster (a single warning about longpath support).

However when looking with Cube Client and with Upgrades/VersionHistory.txt it indicated that everything was still on 10.2.0.0-11897-20220611-release.

To combat this, I stopped both agents and performed the upgrade locally on each agent.

When starting the agents only 1 agent in the cluster starts correctly. When I attempted to open cube client on the broken agent it got stuck on "retrieving initial data". Checking the logs I found issues with NATS needing a restart and connection issues to cassandra.

To combat this, I restarted the server holding the agent.

Despite several agent restarts after this, still not getting it started

Any ideas on how to get this fixed? Any ideas on how to avoid this from happening for other users?

Details:

I'm seeing the following errors in the logging:

SLAnalytics: 2023/07/10 15:26:57.542|SLAnalytics|SLNetConnection.cpp(169): Skyline::DataMiner::Analytics::SLNetConnection::openConnection)|ERR|0|Exception while opening SLNetConnection:

SLErrors: 2023/07/10 15:26:57.542|SLAnalytics.txt|SLAnalytics|SLNetConnection.cpp(169): Skyline::DataMiner::Analytics::SLNetConnection::openConnection)|ERR|0|Exception while opening SLNetConnection:

SLDataMiner is stuck on: 2023/07/10 15:26:41.915|SLDataMiner.exe 10.3.2321.1738|4108|3608|CRequest::Init|DBG|0|** Initializing SLNetCom
**********

SLNET indicated:

2023-07-10 15:28:48.212|26|ExecutionContext.RunInternal|Destroying connection e58fae35-39e9-448f-82a8-660f9dc22476 (DataMiner Cloud Platform): Authentication took too long.
2023-07-10 15:28:48.228|26|ExecutionContext.RunInternal|Destroying connection 12d50385-fc4b-4362-8999-d801630501c4 (SLNet on qa-dma-test-10): Authentication took too long.
2023-07-10 15:28:48.243|70|Destroy|Connection did not authenticate. Computer: QA-DMA-TEST-14 Application: DataMiner Cloud Platform
2023-07-10 15:28:48.243|71|Destroy|Connection did not authenticate. Computer: QA-DMA-TEST-10 Application: SLNet on qa-dma-test-10
2023-07-10 15:28:48.243|70|GenerateInformationAlarm|Not generating information alarm (no agent up and running): 56/2100000000/64637 [Connection did not authenticate. Computer: QA-DMA-TEST-14 Application: DataMiner Cloud Platform ]

followed by constantly failing:

2023-07-10 15:49:39.626|91|Destroy|Connection did not authenticate. Computer: QA-DMA-TEST-14 Application: SLAnalytics
2023-07-10 15:49:39.626|91|GenerateInformationAlarm|Not generating information alarm (no agent up and running): 56/2100000000/64637 [Connection did not authenticate. Computer: QA-DMA-TEST-14 Application: SLAnalytics ] 2023-07-10 15:49:39.626|91|GenerateInformationAlarm|Not generating information alarm (no agent up and running): 56/2100000000/64505 [SLAnalytics removed because of error: Authentication took too long.] 2023-07-10 15:49:43.349|62|AuthenticationStep|

SLWatchdog2

2023-07-10 15:47:27 2196|Failed to generate alarm for "NATS has stopped, restarting...": There's no connection available with this dataminer. (0x800402cdh)
2023-07-10 15:48:27 2196|NATS has stopped, restarting...

Robin Devos [SLC] [DevOps Advocate] Edited comment 11th July 2023
Robin Devos [SLC] [DevOps Advocate] commented 10th July 2023

Hi Jan
Not here with a solution, just wondering a couple of things…
1. Does the upgrade.log indicate an issue? an exception, an error,…
This is found in C:/Skyline DataMiner/Upgrades/Packages//upgrade.log
2. Were there any .Net packages installed during the upgrade (search for “InstallDotNet” during the step “ExecuteUpgradeActions”)
-> If 1 of the packages got installed, did the server reboot?

Avatar photo
Jan Staelens [SLC] [DevOps Advocate] commented 10th July 2023

There is no upgrade.log file. On neither of the agents. Not in that location at least, also not under DataMiner Logs. I’m seeing backups of the logging from back in 2022 when I installed 10.2. Is it possible the logfile no longer exists or was moved to a different location?

1 Answer

  • Active
  • Voted
  • Newest
  • Oldest
2
Avatar photo
Jan Staelens [SLC] [DevOps Advocate]889 Posted 10th July 2023 0 Comments

Fixed after some more detective sleuthing.

I focused on NATS as my guess for being the root cause, as it was often remarked it can often fail to play nice with DataMiner.

To assist me. I found this linked in a previous DoJo post:

https://docs.dataminer.services/user-guide/Troubleshooting/Procedures/Investigating_NATS_Issues.html

Eventually I discovered NATS Service was not running but NAS was. Firewall settings were OK however. But the nats-server.config located here C:\Skyline DataMiner\NATS\nats-streaming-server  was not correctly configured.

Both servers need the same IP configured in the resolver setting. However Agent 1 of the cluster was setup with 0.0.0.0  and agent 2 was setup with the HTTPS Url of agent 1.

I changed both to point to the HTTPS URL of Agent 1 and rebooted both servers.

This allowed everything to start correctly.

Marieke Goethals [SLC] [DevOps Catalyst] Selected answer as best 11th July 2023
Please login to be able to comment or post an answer.

My DevOps rank

DevOps Members get more insights on their profile page.

My user earnings

0 Dojo credits

Spend your credits in our swag shop.

0 Reputation points

Boost your reputation, climb the leaderboard.

Promo banner DataMiner DevOps Professiona Program
DataMiner Integration Studio (DIS)
Empower Katas

Recent questions

Web Applications exception in Cube due to invalid certificate 0 Answers | 0 Votes
Redundancy Groups and Alarming – Duplicate Alarms 0 Answers | 0 Votes
Correlation Engine: “Test rule” doesn’t result in a hit, despite functional rule 1 Answer | 3 Votes

Question Tags

adl2099 (115) alarm (62) Alarm Console (82) alarms (100) alarm template (83) Automation (223) automation scipt (111) Automation script (167) backup (71) Cassandra (180) Connector (109) Correlation (69) Correlation rule (52) Cube (151) Dashboard (194) Dashboards (188) database (83) DataMiner Cube (57) DIS (81) DMS (71) DOM (140) driver (65) DVE (56) Elastic (83) Elasticsearch (115) elements (80) Failover (104) GQI (159) HTTP (76) IDP (74) LCA (152) low code app (166) low code apps (93) lowcodeapps (75) MySQL (53) protocol (203) QAction (83) security (88) SNMP (86) SRM (337) table (54) trending (87) upgrade (62) Visio (539) Visual Overview (345)
Privacy Policy • Terms & Conditions • Contact

© 2025 Skyline Communications. All rights reserved.

DOJO Q&A widget

Can't find what you need?

? Explore the Q&A DataMiner Docs

[ Placeholder content for popup link ] WordPress Download Manager - Best Download Management Plugin