Hi Dojo,
On one of our clusters I'm running the BPAs and I'm getting an error from the NATS BPA:
- NATS client could not communicate with a NATS client from DataMiner with address 10.** etc
I've been through the troubleshooting steps and I've discovered that the agent with the offending IP has <NATSForceManualConfig>true</NATSForceManualConfig> in maintenanceSettings.xml, whereas every other agent in the cluster doesn't have this, implying automatic NATS configuration.
I'm thinking the best thing to resolve this would be to remove that config line, and then try a NATS reset as documented here: https://docs.dataminer.services/dataminer/Troubleshooting/Procedures/Investigating_NATS_Issues.html?q=+investigating+nats+issues&tabs=tabid-1%2Ctabid-4%2Ctabid-8%2Ctabid-10#try-a-nats-reset
I just wanted to check if anyone else had any experience with this, and if those steps should be efficient, or is there anything else I'm missing?
Thanks, Carl
Hi Carl,
Having the "NATSForceManualConfig" flag set to true, DataMiner will not validate or alter the NATS-configuration. So any mistakes or changes in the network (such as changes in IP) or the DMS (such as adding or removing nodes) can lead to the problem you are describing where certain NATS-nodes are not reachable.
Removing the "NATSForceManualConfig" and using the Reset is indeed the correct approach, as long as you do not rely on a special configuration, things like:
- Inter-Node TLS (Securing NATS)
- Gateway/Leaf-nodes
- Custom IPs for NATS-Internode
More info on how to configure this flag can be found in the docs, be aware that this flag must be the same for all agents in the cluster, otherwise the system will definitely run into trouble.
If you are unsure about these actions, feel free to reach out to techsupport for assistance.
Hopes this helps answer the question.