Would welcome your thoughts on this.
We are having difficulties polling devices with an Nport 5150 moxa com-server. (Currently running DataMiner 9.5CU16)
So every once in a while DataMiner seems to have lost connection with the device.
The device in use is an intrac 405 antenna controller via a moxa Nport 5150.
Protocol used :intrax 405 version 220.127.116.11
Moxa firmware 3.4 build 11080114
The setup is the same as described on page 1 to 6 in the dataminer system administrator training manual, the only difference is that this is an 16 port serial com server and the ones we use is a 1 port serial interface.
They work the same :
Ip address + port number and port number is the serial side.
This setup has worked for years in a row without problems. The moxa device works in TCP Server mode.
These communication problems (moxa/intrac) are quite frequent.
Most of the times doing a soft reset on the moxa solves the problem and sometimes we have to do a cold reboot of the moxa.
No firmware changes were done.
In all cases when we lose communication, the communication with the moxa itself is UP and running, we can still control it, reboot it, restart it, change configs etc.
So network wise , we are still in control and there are no communications failures.
When we are doing this, DataMiner has NO communication at this point and will always start polling the device when we restart the element in DataMiner.
So there no known network issues with the devices in our network from a management point of view.
We also tested with the latest version of protocol v18.104.22.168 and upgraded moxa to the latest version 3.8 as well. Yet the issue persists.
In Streamviewer, there was timeouts related to polarization motor(image below) but this controller does not have this option installed;
And upon disabling polarization there were no timeouts.
But I see some other timeouts as well:
There is no way/steps to simulate the issue - the timeouts are completely random.
I have attached wireshark capture at the point where the controller is not being polled by dataminer and it has a communication error.
The issue has been present since the past 2 months and not before that, when DataMiner 9.5CU13 was in use.
How can I find where the issue is happening ?
Is there a reason why no upgrade to v9.6 was done? v9.5 is currently the oldest version we support now and v10.0 is already out since Q1 2020. I don't suppose it's an option to propose another upgrade (to v9.6)
Are there other devices connected to the Moxa('s)? If so, do those go into timeout as well?
Can you perform netstat -a in command prompt to see if the connection is still established?
Did you have a look at SLPort.txt, SLPortSplit.txt, element log? Also check log level = 5
For the Wireshark, I don't really see an issue which would cause a timeout.
Worst case would be to create a logcollector package for investigation along with memory dumps (SLPort, SLProtocol, SLElement, SLDataminer) while the timeouts are busy