Hi dojo,
Is there any known fix for NATS not restarting as expected?
in DM 10.3.0.0-13184-CU5 our squad has occasionally needed to re-install NATS after a DMA reboot:
the symptoms observed are that the service will stay stopped, even after manually starting it, with possible reference to IPC ports in the logs; the DMA will stay disconnected from the cluster, with DataMiner resulting as "restarting" when launching CUBE locally, while Cassandra node tool shows U(p) N(ormal)
Some visual reference below: any steer will be helpful
Hi Alberto,
Hmmm, this is a tricky one, as this is evidently not supposed to happen...
The most frequent reason for NATS not to start is authentication failing.
If the NATS service is automatically reconfigured (logged in "C:\Skyline DataMiner\Logging\SLNATSCustodian.txt"), it is possible that NATS no longer has the correct credentials to start.
Alternatively, we have seen Antivirus programs interfere and remove or quarantine parts of the NATS service after a reboot, which would also lead to this behavior.
On top of that, DataMiner 10.3.0 CU5 still has a few quirks, addressed in later DataMiner versions, which may contribute to the likelihood of this happening.
In either case, it may be interesting to have a look the nats logging at C:\Skyline DataMiner\nats\nats-streaming-server\nats-server.log.
However, since what you are are describing is potentially a bug, I would recommend contacting techsupport with a LogCollector package taken during the failure to start.
From there, we can better assess the setup and have a look together.
That way, we can figure out the reason for this specific failure.
Hope this helps!