Dear,
We have two „run time“ alarms in DMAs which are permanent - can't find reason for it. Elements which were cause of that alarm work fine and in their logs there are no obvious reason for it.
DMA3 - Thread problem in SLProtocol.exe: [Elico ASDU/1.0.0.0] 0206_FM3_DT - ProtocolThread [+ 1 pending] )
DMA1 - Thread problem in SLProtocol.exe: [Elico SCU-06/1.0.0.1] 0366_FM_PA - ProtocolThread [+ 2 pending] )
-
We restarted DMS but after few minutes they were on again with same root times.
Those elements use serial protocol for communication and there are possibilities that time outs in connections and run time errors in QA occurs. Investigating DMA Watchdog logging those two alarms are also present. One log is below:
-
2021-03-16 09:58:47 4552|- (2110) Not signaled 1 (since 2021-03-16 09:51:16): SLProtocol.exe - [Elico SCU-06/1.0.0.1] 0366_FM_PA - ProtocolThread
2021-03-16 09:58:47 4552|HALFOPEN RTE: - (2110) Not signaled 1 (since 2021-03-16 09:51:16): SLProtocol.exe - [Elico SCU-06/1.0.0.1] 0366_FM_PA - ProtocolThread in Process: SLProtocol.exe for Thread: [Elico SCU-06/1.0.0.1] 0366_FM_PA - ProtocolThread notificationID created: 18751
2021-03-16 10:01:23 4552|- (2706) Not signaled 1 (since 2021-03-16 09:53:52): SLProtocol.exe - [Elico SCU-06/1.0.0.1] 0297_FM_PA - ProtocolThread
2021-03-16 10:01:23 4552|HALFOPEN RTE: - (2706) Not signaled 1 (since 2021-03-16 09:53:52): SLProtocol.exe - [Elico SCU-06/1.0.0.1] 0297_FM_PA - ProtocolThread in Process: SLProtocol.exe for Thread: [Elico SCU-06/1.0.0.1] 0297_FM_PA - ProtocolThread notificationID created: 18752
2021-03-16 10:01:26 4552|- (2704) Not signaled 1 (since 2021-03-16 09:53:55): SLProtocol.exe - [Elico SCU-06/1.0.0.1] 0006_FM_PA - ProtocolThread
2021-03-16 10:01:26 4552|HALFOPEN RTE: - (2704) Not signaled 1 (since 2021-03-16 09:53:55): SLProtocol.exe - [Elico SCU-06/1.0.0.1] 0006_FM_PA - ProtocolThread in Process: SLProtocol.exe for Thread: [Elico SCU-06/1.0.0.1] 0006_FM_PA - ProtocolThread notificationID created: 18753
2021-03-16 10:06:18 4552|>>>>>>> (2110) THREAD PROBLEM : SLProtocol.exe - [Elico SCU-06/1.0.0.1] 0366_FM_PA - ProtocolThread
2021-03-16 10:06:18 4552|Send alarm for process SLProtocol.exe (bSignaled = FALSE, bStopped = FALSE) for iCookie = 2110 (RTE Count = 1)
2021-03-16 10:06:18 4552|** Making minidump ..
2021-03-16 10:06:26 4552|** Making minidump C:\Skyline DataMiner\Logging\MiniDump\2021_03_16 10_06_18_mini_SLProtocol.exe.zip finished.
2021-03-16 10:06:26 4552|OPEN RTE: Runtime error in process SLProtocol.exe on agent dma-1 in Process: SLProtocol.exe for Thread: [Elico SCU-06/1.0.0.1] 0366_FM_PA - ProtocolThread with notificationID: 18751
2021-03-16 10:08:54 4552|>>>>>>> (2706) THREAD PROBLEM : SLProtocol.exe - [Elico SCU-06/1.0.0.1] 0297_FM_PA - ProtocolThread
2021-03-16 10:08:57 4552|>>>>>>> (2704) THREAD PROBLEM : SLProtocol.exe - [Elico SCU-06/1.0.0.1] 0006_FM_PA - ProtocolThread
2021-03-16 10:09:50 4552|- (2554) Not signaled 1 (since 2021-03-16 10:02:19): SLProtocol.exe - [Elico SCU-06/1.0.0.1] 0411_FM_PA – ProtocolThread
2021-03-16 10:09:50 4552|HALFOPEN RTE: - (2554) Not signaled 1 (since 2021-03-16 10:02:19): SLProtocol.exe - [Elico SCU-06/1.0.0.1] 0411_FM_PA - ProtocolThread in Process: SLProtocol.exe for Thread: [Elico SCU-06/1.0.0.1] 0411_FM_PA - ProtocolThread notificationID created: 18754
-
We noticed that in Cube view - System Center, Agents, Agent Alarms, line Show agents alarms stay DMA1 (0 errors, 1 timeout, 0 notices ), DMA3 (0 errors, 3 timeout, 0 notices ), - zero error? When click on link to table, there are earlier mentioned run time errors.
-
What is mechanism for auto clearing RTE alarms?
What can we do to remove those alarms?
How can we deeper investigate possibilities of that Alarms?
-
DMAs are 4 in cluster, 9.6.0.0-8578.
Thank you!
Hi Jurica,
Next to the feedback from Miguel, it's also important to understand that an RTE indicates a hanging action (or an action that takes longer than expected).
Since this is a protocol rte error, understanding how actions are handled by the SLProtocol process is of key importance to pinpoint the cause of the issue.
Hi Jurica,
Checking the log files that you attached, it seems that the issue is related to the drivers:
- Elico ASDU/1.0.0.0
- Elico SCU-06/1.0.0.1
Before to go further, please could you let us know if these drivers were validated using the DIS validator?
Please find more details about the DataMiner Integration Studio (DIS) in DataMiner Help:
Hi Miguel,
Thank you for response.
Drivers wrote before we use DIS. I checked driver with DIS and there are few errors. Will correct those errors soon when will have time, but those drivers were working for years without noticeable problems.
Most errors in DIS are duplicated parameter description, and for QA Warnings like this: “Unrecommended use of magic number ‘307’, use ‘Parameter’ class instead. QAction ID ‘1’. (line 785 col 3) [3.7.2] “.
–
Can I manually delete this alarm and how?
Best regards
Hi Jurica,
Please could you correct first the issues reported by DIS and test again the driver?
As indicated by Ive, keep in mind that although the elements looks working fine, in the background there could be a hanging action (or an action that takes longer than expected). This could be the reason of the RTE.
Alarms cannot be removed. In order to be sure that the driver will work correctly, all the errors/warning should be solved.
Hi,
It seems that restarting all elements which use this serial driver and after that restarting DMA which report RTE error with this driver solved problem.
Errors are also fixed. Hope that this will not happened again.
Thank you for support.
Best regards, Jurica
Some extra help on resolving Protocol RTEs can be found here:
https://community.dataminer.services/protocol-thread-run-time-error-use-cases/