Skip to content
DataMiner DoJo

More results...

Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Search in posts
Search in pages
Log in
Menu
  • Blog
  • Questions
  • Learning
    • E-learning Courses
    • Open Classroom Training
    • Certification
      • DataMiner Fundamentals
      • DataMiner Configurator
      • DataMiner Automation
      • Scripts & Connectors Developer: HTTP Basics
      • Scripts & Connectors Developer: SNMP Basics
      • Visual Overview – Level 1
      • Verify a certificate
    • Tutorials
    • Video Library
    • Books We Like
    • >> Go to DataMiner Docs
  • Expert Center
    • Solutions & Use Cases
      • Solutions
      • Use Case Library
    • Markets & Industries
      • Media production
      • Government & defense
      • Content distribution
      • Service providers
      • Partners
      • OSS/BSS
    • DataMiner Insights
      • Security
      • Integration Studio
      • System Architecture
      • DataMiner Releases & Updates
      • DataMiner Apps
    • Agile
      • Agile Webspace
      • Everything Agile
        • The Agile Manifesto
        • Best Practices
        • Retro Recipes
      • Methodologies
        • The Scrum Framework
        • Kanban
        • Extreme Programming
      • Roles
        • The Product Owner
        • The Agile Coach
        • The Quality & UX Coach (QX)
    • DataMiner DevOps Professional Program
  • Downloads
  • More
    • Feature Suggestions
    • Climb the leaderboard!
    • Swag Shop
    • Contact
      • General Inquiries
      • DataMiner DevOps Support
      • Commercial Requests
    • Global Feedback Survey
  • PARTNERS
    • All Partners
    • Technology Partners
    • Strategic Partner Program
    • Deal Registration
  • >> Go to dataminer.services

NATS – error in alarm console

Solved3.06K views11th May 2021firewall NATS
3
Arunkrishna Shreeder [SLC] [DevOps Advocate]4.00K 6th May 2021 0 Comments

Hi Dojo,

We have a cluster of 4 agents(without failovers) where recently an upgrade to the feature release 10.1.3-9963 was performed. Ever since then we have seen the 'NATS has stopped, restarting...' error. But this is present only on agents 2, 3 and 4 - not on agent 1.

We have the firewall rules enabled on all 4 agents (example of one agent below):

I also see on agent 1, the NATS service is continuously running without stopping. But on the other 3 agents, it is continuously stopping and starting. I cannot stop the service on the 3 agents, it does not work.

I tried to end the process tree of the corresponding process nats-streaming-server.exe but the process too appears and disappears continuously that by the time I click End Process Tree it already has disappeared.

The NAS service is continuously running.

How can I remove these errors from the alarm console ? Thank you in advance.

Update - May 7 : We now have the error only on Agent-2(.131), after the firewall was updated on all agents. I saw these logs from Agent-1 :
[13296] 2021/05/07 16:27:52.737252 [DBG] 172.30.144.131:62934 - cid:3708 - Client connection created
[13296] 2021/05/07 16:27:52.741110 [DBG] Account [ABRBZXY4MLTM2LGLUXSGVA3W6WP7AFSF5ZMIJBG4F2V2HI6NBTX2VMKL] fetch took 1.9951ms
[13296] 2021/05/07 16:27:52.741110 [WRN] Account fetch failed: could not fetch <"http://0.0.0.0:9090/jwt/v1/accounts/ABRBZXY4MLTM2LGLUXSGVA3W6WP7AFSF5ZMIJBG4F2V2HI6NBTX2VMKL">: 500 Internal Server Error
[13296] 2021/05/07 16:27:52.741110 [DBG] 172.30.144.131:62934 - cid:3708 - Account JWT lookup error: could not fetch <"http://0.0.0.0:9090/jwt/v1/accounts/ABRBZXY4MLTM2LGLUXSGVA3W6WP7AFSF5ZMIJBG4F2V2HI6NBTX2VMKL">: 500 Internal Server Error
[13296] 2021/05/07 16:27:52.741110 [ERR] 172.30.144.131:62934 - cid:3708 - authentication error
[13296] 2021/05/07 16:27:52.741110 [DBG] 172.30.144.131:62934 - cid:3708 - Client connection closed
[13296] 2021/05/07 16:27:52.780121 [INF] 172.30.144.131:62926 - rid:3706 - Router connection closed
[13296] 2021/05/07 16:27:53.356184 [ERR] Error trying to connect to route (attempt 225): dial tcp 172.30.144.131:6222: i/o timeout
[13296] 2021/05/07 16:27:54.360141 [DBG] Trying to connect to route on 172.30.144.131:6222
[13296] 2021/05/07 16:27:55.362065 [ERR] Error trying to connect to route (attempt 226): dial tcp 172.30.144.131:6222: i/o timeout

On agent-2 I see that ports 4222, 6222 and 8222 are not in a listening state, yet the person responsible says the ports are open. What am I missing ? TIA

Arunkrishna Shreeder [SLC] [DevOps Advocate] Selected answer as best 11th May 2021

5 Answers

  • Active
  • Voted
  • Newest
  • Oldest
6
Mattias Claes [SLC]20 Posted 7th May 2021 1 Comment

This is a firewall issue. NATS isn't starting because NAS (the account server) can't connect to the primary (= the agent with the lowest lexicographical IP address, you can find this in nas.config). NAS can start without a connection to the primary but it won't load any JWTs, which is why you're getting the error 500 when NATS tries to verify its account claims.

On agent-2 I see that ports 4222, 6222 and 8222 are not in a listening state

This is normal. 4222, 6222, and 8222 are the ports used by NATS, which isn't starting. This issue isn't related to those ports, but to port 9090 (the one used by NAS).

Make sure port 9090 is also opened between all DMAs, and you may want to try changing the profile of the Windows firewall rules from Domain to All. You may also have to restart the NAS service on all 4 agents after adjusting the firewall.

Arunkrishna Shreeder [SLC] [DevOps Advocate] Posted new comment 11th May 2021
Arunkrishna Shreeder [SLC] [DevOps Advocate] commented 11th May 2021

Hi Mattias, thanks for your inputs. We changed the firewall rules from Domain to all, restarted the NAS service too. I can also see port 9090 is open on all DMAs, but the above mentioned issue still persists. Please let me know if there is any other way. Right now the error is present only on agent 2

You are viewing 1 out of 5 answers, click here to view all answers.
Please login to be able to comment or post an answer.

My DevOps rank

DevOps Members get more insights on their profile page.

My user earnings

0 Dojo credits

Spend your credits in our swag shop.

0 Reputation points

Boost your reputation, climb the leaderboard.

Promo banner DataMiner DevOps Professiona Program
DataMiner Integration Studio (DIS)
Empower Katas

Recent questions

How to implement bearer token refresh? 0 Answers | 0 Votes
Web Applications exception in Cube due to invalid certificate 0 Answers | 0 Votes
Redundancy Groups and Alarming – Duplicate Alarms 0 Answers | 0 Votes

Question Tags

adl2099 (115) alarm (62) Alarm Console (82) alarms (100) alarm template (83) Automation (223) automation scipt (111) Automation script (167) backup (71) Cassandra (180) Connector (109) Correlation (69) Correlation rule (52) Cube (151) Dashboard (194) Dashboards (188) database (83) DataMiner Cube (57) DIS (81) DMS (71) DOM (140) driver (65) DVE (56) Elastic (83) Elasticsearch (115) elements (80) Failover (104) GQI (159) HTTP (76) IDP (74) LCA (152) low code app (166) low code apps (93) lowcodeapps (75) MySQL (53) protocol (203) QAction (83) security (88) SNMP (86) SRM (337) table (54) trending (87) upgrade (62) Visio (539) Visual Overview (345)
Privacy Policy • Terms & Conditions • Contact

© 2025 Skyline Communications. All rights reserved.

DOJO Q&A widget

Can't find what you need?

? Explore the Q&A DataMiner Docs

[ Placeholder content for popup link ] WordPress Download Manager - Best Download Management Plugin