Hi,
Following an upgrade to DataMiner 10.1.4.0 -10077 Feature Release (a message at the end confirmed a successful upgrade), a few runtime errors are seen in the alarm console that read "NATS has stopped working, restarting...":
This system was running DataMiner 10 CU5 prior to the upgrade.
RN 28463 seems to describe this issue and suggest the following steps:
"On a DMA with NAS and NATS installed: stop the services (don't delete them) and delete the C:\Skyline DataMiner\NATS folder. Restart your DMA, assert the services are reinstalled and no lingering error alarms indicating restarts of the services are present. If the alarms are present initially, they should be automatically cleared after some time."
Can these steps be safely applied in a production DMS?
How impactful is this error to the overal health of the DMS?
What should be done to correct it?
Thank you for your help.
UPDATE 23/04/2021:
I am adding an update to this question (since I cannot answer my own questions and need to add pictures) to thank everyone that helped and to share a workaround that sorted this out (still not sure why this happened in the first place).
In the task manager it is possible to see that the service is stuck (in this case reads "stopping") and there is no way to kill it (the options either do not work or are greyed out).
Since it had a PID, the corresponding process is nats-streaming-server.exe.
Ending the process tree will cause this process to restart with a new PID:
Checking the NATS now will show it Running with this new PID:
The alarms on the alarm console will clear after this.
Hi Joao,
Please make sure the following TCP ports are allowed on the DMAs: 4222, 6222, 8222, 9090. Normally DataMiner creates firewall rules automatically during the upgrade, but the changes may be prevented by group policies.
This is how Windows Firewall rules should look like:
As far as I know, NATS is not in use yet, so there shouldn't be any impact for the DMS.
Just added an update to the question with a workaround that sorted out the error on the alarm consolve.
Hi Guys I just want to share how I fixed those errors I think it's similar to what Joao mention (Finish the processes) but with the command line, here the link: https://support.4it.com.au/article/how-to-kill-a-windows-service-which-is-stuck-at-stopping/
Hi Alexander,
In the agent I am connected now, those ports are (although I just have a single entry for NAS and NATS – doubt that is any issue).
Not sure about the other agents, but it can be checked.