Hi Dojo,
I have the following notice alarm in my DMS:
Unexpected Exception [Sending SLDataGateway message request:DataRequest<Alarm> with filter: ((Alarm.DataMinerID[Int32] ==xxxxxx) AND (Alarm.RootAlarmID[Int32] ==yyyyyy)) over NATS failed with error: DataMinerMessageBroker.API.Exceptions.ChunkTimeoutException: Timeout Occurred for chunk 0 [BrokerID: 89b9b604-ff69-4cc0-ac64-359425c0d269] ---> NATS.Client.NATSTimeoutException: Timeout occurred.
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at NATS.Client.Connection.<RequestAsyncImpl>d__147.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at DataMinerMessageBroker.API.Nats.NatsSession.<RequestAsync>d__50.MoveNext()
--- End of inner exception stack trace ---
at DataMinerMessageBroker.API.Chunking.ChunkingMessageBroker.SendChunkAcked(String subject, Byte[] bytes, Int32 length, Int32 chunkID)
at DataMinerMessageBroker.API.Chunking.ChunkGenerator.SendMessage(String subject, Int64 responseId, Stream data, Boolean bNeedsAck, ChunkHandler handler)
at DataMinerMessageBroker.API.Chunking.ChunkGenerator.Request(String subject, Int64 responseId, Byte[] bytes, Int32 length)
at DataMinerMessageBroker.API.Chunking.ChunkingMessageBroker.<RequestAsync>d__30.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at DataMinerMessageBroker.API.Chunking.ChunkingMessageBroker.<RequestAsync>d__33.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at Skyline.DataMiner.Net.SLDataGateway.DataGateway.SendMessage[TReq,TResp](TReq request, String method, Nullable`1 timeout)]: Get Correlation Details for zzzz/wwwwww ( at Skyline.DataMiner.Net.SLDataGateway.DataGateway.SendMessage[TReq,TResp](TReq request, String method, Nullable`1 timeout)
at Skyline.DataMiner.Net.SLDataGateway.DataGateway.ExecuteRequest(BaseRequest request)
at Skyline.DataMiner.Net.Facade.HandleClientRequestMessage(IConnectionInfo connInfo, ClientRequestMessage oneMsg, Boolean canQueue)
at Skyline.DataMiner.Net.Facade.HandleMessageInternal(IConnectionInfo connInfo, DMSMessage oneMsg, Int32 groupID, Int32 groupTotal)
at Skyline.DataMiner.Net.Facade.HandleMessage(IConnectionInfo connInfo, DMSMessage oneMsg, Int32 groupID, Int32 groupTotal)
at Skyline.DataMiner.Net.BaseFacade.HandleMessage(IConnectionInfo connInfo, DMSMessage oneMsg)
at Skyline.DataMiner.Net.BaseFacade.HandleSingleResponseMessage(IConnectionInfo connInfo, DMSMessage oneMsg)
at Skyline.DataMiner.Net.BaseFacade.HandleSingleResponseMessage(DMSMessage oneMsg)
at Skyline.DataMiner.Net.DataMiner.FindAlarms(Int32 hostingAgentId, Int32[] unsortedAlarmIDs)
at Skyline.DataMiner.Net.DataMiner.GetAlarmDetails(IConnectionInfo connInfo, Int32 hostingAgentId, Int32[] alarmIDs, Boolean fullTrees)
at Skyline.DataMiner.Net.DataMiner.ComposeCorrelationDetails(AlarmEventMessage alarmMessage, Int32 depth)
at Skyline.DataMiner.Net.DataMiner.ComposeCorrelationDetails(AlarmEventMessage alarmMessage)
at Skyline.DataMiner.Net.DataMiner.QueuedComposeCorrelationDetails_One(AlarmEventMessage alarmMessage))
From the notice, it appears that the request to retrieve alarm xxxx/yyyyyy from the local Cassandra database timed out. However, based on the stack trace, this alarm request was triggered due to a need to retrieve correlation details for zzzz/wwwwww.
The issue is that when I query the alarm table (SELECT * FROM alarm WHERE r='zzzz_wwwwww'
), no results are returned, meaning there is no correlation alarm with this ID.
So, my questions are:
- What is the impact of this alarm on the system, and what does it actually mean?
- Why is DataMiner requesting correlation details for an alarm that does not exist?
- Could "Get Correlation Details for zzzz/wwwwww" be referring to a correlation bucket rather than an alarm?
Kind regards,
Hello,
- The impact can be that a correlation alarm contains a sticky base alarm link. If it was recently cleared, that might be visible in the history alarms.
- SLNet keeps track of recently cleared alarms and will attempt to load the alarm from the database to keep the data in sync with the correlation details of the correlated alarm.
- I believe that is an alarm tree ID
Regards


Well the exception occurs trying to fetch it, so there maybe a problem with generating it or inserting it. The data is kept in the correlation details in a separate table maybe these got out of sync or an issue happened where offloading the correlation alarm failed, but the correlation details did get saved.
Thank you for your answer. I still have a doubt: For point 3, if it is a correlation alarm, then why can't I find it? It should be in the alarm table, correct?