Hi Dojo,
When in presence of errors on SLProtocol.exe, DataMiner promptly flags these to admins & users:
what's the best type of capture // memory dump that can help in troubleshooting the root cause behind these hanging calls? Is it worth setting up some automation / correlation so that the related data collected automatically every time this type of RTE is listed in alarm console?
Thanks
An RTE in SLProtocol.exe is oftentimes linked to calls with one or more of the following processes:
- SLScripting
- SLSNMPManager
- SLPort
- SLElement
Taking dumps automatically on these errors may help when these only appear temporarily. i.e. the process needs more than 15 minutes to handle a certain request, but eventually gets through it. A reason you wouldn't want to do this is when these appear frequently and risk flooding the disk, or interrupt the process too often.
Ideally, full memory dumps are taken. Only when opening the SLProtocol dump would we be able to see if and which processes are also involved in this particular RTE. Grabbing all of them as a precaution can speed up the investigation speed, at the cost of having large files that may turn out to be irrelevant for the investigation.
Hi Alberto,
I am facing a similar issue for one of my customers. In my case, I found out the issue to be a driver problem. The way I found out was by executing Pending Calls via the SLNetClientTest tool.
If you grab the elementID from the Edit page of the element presenting the RTE, and you run the Pending Calls on the DMA hosting that element, you will probably be able to find which group or QAction is causing the protocol to get stuck.
With this information, you can later have a look at the driver and see what information is being retrieved by that group, and use this to guide a driver investigation afterwards.
You can find more related information and relevant troubleshooting steps in the following page
Kind regards
Hi,
When such an RTE is present, the best bet would be to start the SLLogCollector and let it gather the needed memory dumps. I don't know if this can be automated to let it gather always automatically.
When a thread is stuck in SLProtocol it can have various root causes, which are not necessarily present in SLProtocol itself. For example it could be waiting for a QAction to finish, but that QAction is stuck or taking a long time. In such case it is also interesting to have a memory dump of SLScripting, because else the memory dump analysis will show that SLProtocol is waiting on the QAction but you still don't know what the QAction is doing at that point. SLProtocol could also be waiting on something external to enter: e.g. a serial response and then a memory dump of SLPort is needed, or an SNMP response and then a memory dump of SLSNMPManager is needed. So basically it can have different root causes, but in most cases there's a QAction involved so besides SLProtocol make sure to include a memory dump of SLScripting.
Regards,
Thanks for this helpful insight, Laurens – marking this as solved
Thanks for the thorough feedback, Floris