Dear,
We have two „run time“ alarms in DMAs which are permanent - can't find reason for it. Elements which were cause of that alarm work fine and in their logs there are no obvious reason for it.
DMA3 - Thread problem in SLProtocol.exe: [Elico ASDU/1.0.0.0] 0206_FM3_DT - ProtocolThread [+ 1 pending] )
DMA1 - Thread problem in SLProtocol.exe: [Elico SCU-06/1.0.0.1] 0366_FM_PA - ProtocolThread [+ 2 pending] )
-
We restarted DMS but after few minutes they were on again with same root times.
Those elements use serial protocol for communication and there are possibilities that time outs in connections and run time errors in QA occurs. Investigating DMA Watchdog logging those two alarms are also present. One log is below:
-
2021-03-16 09:58:47 4552|- (2110) Not signaled 1 (since 2021-03-16 09:51:16): SLProtocol.exe - [Elico SCU-06/1.0.0.1] 0366_FM_PA - ProtocolThread
2021-03-16 09:58:47 4552|HALFOPEN RTE: - (2110) Not signaled 1 (since 2021-03-16 09:51:16): SLProtocol.exe - [Elico SCU-06/1.0.0.1] 0366_FM_PA - ProtocolThread in Process: SLProtocol.exe for Thread: [Elico SCU-06/1.0.0.1] 0366_FM_PA - ProtocolThread notificationID created: 18751
2021-03-16 10:01:23 4552|- (2706) Not signaled 1 (since 2021-03-16 09:53:52): SLProtocol.exe - [Elico SCU-06/1.0.0.1] 0297_FM_PA - ProtocolThread
2021-03-16 10:01:23 4552|HALFOPEN RTE: - (2706) Not signaled 1 (since 2021-03-16 09:53:52): SLProtocol.exe - [Elico SCU-06/1.0.0.1] 0297_FM_PA - ProtocolThread in Process: SLProtocol.exe for Thread: [Elico SCU-06/1.0.0.1] 0297_FM_PA - ProtocolThread notificationID created: 18752
2021-03-16 10:01:26 4552|- (2704) Not signaled 1 (since 2021-03-16 09:53:55): SLProtocol.exe - [Elico SCU-06/1.0.0.1] 0006_FM_PA - ProtocolThread
2021-03-16 10:01:26 4552|HALFOPEN RTE: - (2704) Not signaled 1 (since 2021-03-16 09:53:55): SLProtocol.exe - [Elico SCU-06/1.0.0.1] 0006_FM_PA - ProtocolThread in Process: SLProtocol.exe for Thread: [Elico SCU-06/1.0.0.1] 0006_FM_PA - ProtocolThread notificationID created: 18753
2021-03-16 10:06:18 4552|>>>>>>> (2110) THREAD PROBLEM : SLProtocol.exe - [Elico SCU-06/1.0.0.1] 0366_FM_PA - ProtocolThread
2021-03-16 10:06:18 4552|Send alarm for process SLProtocol.exe (bSignaled = FALSE, bStopped = FALSE) for iCookie = 2110 (RTE Count = 1)
2021-03-16 10:06:18 4552|** Making minidump ..
2021-03-16 10:06:26 4552|** Making minidump C:\Skyline DataMiner\Logging\MiniDump\2021_03_16 10_06_18_mini_SLProtocol.exe.zip finished.
2021-03-16 10:06:26 4552|OPEN RTE: Runtime error in process SLProtocol.exe on agent dma-1 in Process: SLProtocol.exe for Thread: [Elico SCU-06/1.0.0.1] 0366_FM_PA - ProtocolThread with notificationID: 18751
2021-03-16 10:08:54 4552|>>>>>>> (2706) THREAD PROBLEM : SLProtocol.exe - [Elico SCU-06/1.0.0.1] 0297_FM_PA - ProtocolThread
2021-03-16 10:08:57 4552|>>>>>>> (2704) THREAD PROBLEM : SLProtocol.exe - [Elico SCU-06/1.0.0.1] 0006_FM_PA - ProtocolThread
2021-03-16 10:09:50 4552|- (2554) Not signaled 1 (since 2021-03-16 10:02:19): SLProtocol.exe - [Elico SCU-06/1.0.0.1] 0411_FM_PA – ProtocolThread
2021-03-16 10:09:50 4552|HALFOPEN RTE: - (2554) Not signaled 1 (since 2021-03-16 10:02:19): SLProtocol.exe - [Elico SCU-06/1.0.0.1] 0411_FM_PA - ProtocolThread in Process: SLProtocol.exe for Thread: [Elico SCU-06/1.0.0.1] 0411_FM_PA - ProtocolThread notificationID created: 18754
-
We noticed that in Cube view - System Center, Agents, Agent Alarms, line Show agents alarms stay DMA1 (0 errors, 1 timeout, 0 notices ), DMA3 (0 errors, 3 timeout, 0 notices ), - zero error? When click on link to table, there are earlier mentioned run time errors.
-
What is mechanism for auto clearing RTE alarms?
What can we do to remove those alarms?
How can we deeper investigate possibilities of that Alarms?
-
DMAs are 4 in cluster, 9.6.0.0-8578.
Thank you!
Hi Jurica,
Next to the feedback from Miguel, it's also important to understand that an RTE indicates a hanging action (or an action that takes longer than expected).
Since this is a protocol rte error, understanding how actions are handled by the SLProtocol process is of key importance to pinpoint the cause of the issue.
Some extra help on resolving Protocol RTEs can be found here:
https://community.dataminer.services/protocol-thread-run-time-error-use-cases/