Hi Dojo,
When in presence of errors on SLProtocol.exe, DataMiner promptly flags these to admins & users:
what's the best type of capture // memory dump that can help in troubleshooting the root cause behind these hanging calls? Is it worth setting up some automation / correlation so that the related data collected automatically every time this type of RTE is listed in alarm console?
Thanks
An RTE in SLProtocol.exe is oftentimes linked to calls with one or more of the following processes:
- SLScripting
- SLSNMPManager
- SLPort
- SLElement
Taking dumps automatically on these errors may help when these only appear temporarily. i.e. the process needs more than 15 minutes to handle a certain request, but eventually gets through it. A reason you wouldn't want to do this is when these appear frequently and risk flooding the disk, or interrupt the process too often.
Ideally, full memory dumps are taken. Only when opening the SLProtocol dump would we be able to see if and which processes are also involved in this particular RTE. Grabbing all of them as a precaution can speed up the investigation speed, at the cost of having large files that may turn out to be irrelevant for the investigation.
Thanks for the thorough feedback, Floris