Hi Dojo,
I have the following notice alarm in my DMS:
Unexpected Exception [Sending SLDataGateway message request:DataRequest<Alarm> with filter: ((Alarm.DataMinerID[Int32] ==xxxxxx) AND (Alarm.RootAlarmID[Int32] ==yyyyyy)) over NATS failed with error: DataMinerMessageBroker.API.Exceptions.ChunkTimeoutException: Timeout Occurred for chunk 0 [BrokerID: 89b9b604-ff69-4cc0-ac64-359425c0d269] ---> NATS.Client.NATSTimeoutException: Timeout occurred.
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at NATS.Client.Connection.<RequestAsyncImpl>d__147.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at DataMinerMessageBroker.API.Nats.NatsSession.<RequestAsync>d__50.MoveNext()
--- End of inner exception stack trace ---
at DataMinerMessageBroker.API.Chunking.ChunkingMessageBroker.SendChunkAcked(String subject, Byte[] bytes, Int32 length, Int32 chunkID)
at DataMinerMessageBroker.API.Chunking.ChunkGenerator.SendMessage(String subject, Int64 responseId, Stream data, Boolean bNeedsAck, ChunkHandler handler)
at DataMinerMessageBroker.API.Chunking.ChunkGenerator.Request(String subject, Int64 responseId, Byte[] bytes, Int32 length)
at DataMinerMessageBroker.API.Chunking.ChunkingMessageBroker.<RequestAsync>d__30.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at DataMinerMessageBroker.API.Chunking.ChunkingMessageBroker.<RequestAsync>d__33.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at Skyline.DataMiner.Net.SLDataGateway.DataGateway.SendMessage[TReq,TResp](TReq request, String method, Nullable`1 timeout)]: Get Correlation Details for zzzz/wwwwww ( at Skyline.DataMiner.Net.SLDataGateway.DataGateway.SendMessage[TReq,TResp](TReq request, String method, Nullable`1 timeout)
at Skyline.DataMiner.Net.SLDataGateway.DataGateway.ExecuteRequest(BaseRequest request)
at Skyline.DataMiner.Net.Facade.HandleClientRequestMessage(IConnectionInfo connInfo, ClientRequestMessage oneMsg, Boolean canQueue)
at Skyline.DataMiner.Net.Facade.HandleMessageInternal(IConnectionInfo connInfo, DMSMessage oneMsg, Int32 groupID, Int32 groupTotal)
at Skyline.DataMiner.Net.Facade.HandleMessage(IConnectionInfo connInfo, DMSMessage oneMsg, Int32 groupID, Int32 groupTotal)
at Skyline.DataMiner.Net.BaseFacade.HandleMessage(IConnectionInfo connInfo, DMSMessage oneMsg)
at Skyline.DataMiner.Net.BaseFacade.HandleSingleResponseMessage(IConnectionInfo connInfo, DMSMessage oneMsg)
at Skyline.DataMiner.Net.BaseFacade.HandleSingleResponseMessage(DMSMessage oneMsg)
at Skyline.DataMiner.Net.DataMiner.FindAlarms(Int32 hostingAgentId, Int32[] unsortedAlarmIDs)
at Skyline.DataMiner.Net.DataMiner.GetAlarmDetails(IConnectionInfo connInfo, Int32 hostingAgentId, Int32[] alarmIDs, Boolean fullTrees)
at Skyline.DataMiner.Net.DataMiner.ComposeCorrelationDetails(AlarmEventMessage alarmMessage, Int32 depth)
at Skyline.DataMiner.Net.DataMiner.ComposeCorrelationDetails(AlarmEventMessage alarmMessage)
at Skyline.DataMiner.Net.DataMiner.QueuedComposeCorrelationDetails_One(AlarmEventMessage alarmMessage))
From the notice, it appears that the request to retrieve alarm xxxx/yyyyyy from the local Cassandra database timed out. However, based on the stack trace, this alarm request was triggered due to a need to retrieve correlation details for zzzz/wwwwww.
The issue is that when I query the alarm table (SELECT * FROM alarm WHERE r='zzzz_wwwwww'
), no results are returned, meaning there is no correlation alarm with this ID.
So, my questions are:
- What is the impact of this alarm on the system, and what does it actually mean?
- Why is DataMiner requesting correlation details for an alarm that does not exist?
- Could "Get Correlation Details for zzzz/wwwwww" be referring to a correlation bucket rather than an alarm?
Kind regards,