I have a SNMP traps protocol that are receiving the status of a device whether it is UP or DOWN.
I also defined an alarm template at which when a DOWN status of a device is received, it will generate a critical (red) alarm. On the contrary, when UP is received it will be normal (green).
In my alarm template, I have also defined a Hysteresis OFF for the critical alarm for 180s. The reason of this is to generate a SNOW ticket whenever we received a critical alarm and for the SNOW ticket process to be created it needs the alarm to be in critical for at least 2 minutes hence I set for 180s.
Now, here is the real situation: What happens when I receive both trap messages with status UP and DOWN for a device at the same time? What will the alarm value be?
Hi Zhing,
At some point your messages will not be processed concurrently. It's already unlikely that they'll arrive at exactly the same time through the wire and UDP socket, I'm pretty sure that SLSNMPManager will receive these through a single thread from the UDP socket. So a certain order will be established within DataMiner, if not here, then whenever the value needs to be evaluated to ensure data integrity. Also, the device won't be sending UP and DOWN at the same time, perhaps with the same timestamp due to its resolution (milliseconds).
So the question becomes, can the UP and DOWN message be received out of order? i.e. the known state is UP, and the device sends DOWN and then immediately UP again. We don't want to end up receiving UP and then DOWN and get 'stuck' in an alarm state while the device is UP.
Depending on the network complexity between DataMiner and the Device, this is a possibility. When using the mapAlarm feature (as described in Traps | DataMiner Docs), then SLDataMiner maps the bindings to a certain alarm. If this happens in the wrong order, then yes, the alarm state may not correctly reflect the device's state as far as we can tell. Assuming the timestamp is the same, there should still be a request ID, for which I couldn't find any out-of-order checking code. But do note that this isn't as straightforward as it seems, as these values eventually wrap around, can be skipped, etc.
So yes, I think it is possible that the alarm state may be wrong if you push this case to the limits, but I don't recall having heard of cases where this actually happened. If this device does have that concern of toggling states very quickly, and the alarm being a process critical state, then I think mapping these traps to a protocol parameter and forcing a validation through a polling would be a safer implementation. Especially since you have a certain interval during which the state may go DOWN again without creating a ticket. Details on how to trigger on traps is explained in this chapter of Traps | DataMiner Docs. There is an important remark on how to actually trigger on them near the end of the chapter.
Hi,
I see that this question has been inactive for some time. Do you still need help with this? If not, could you select the answer that was the most helpful?