I have a SNMP traps protocol that are receiving the status of a device whether it is UP or DOWN.
I also defined an alarm template at which when a DOWN status of a device is received, it will generate a critical (red) alarm. On the contrary, when UP is received it will be normal (green).
In my alarm template, I have also defined a Hysteresis OFF for the critical alarm for 180s. The reason of this is to generate a SNOW ticket whenever we received a critical alarm and for the SNOW ticket process to be created it needs the alarm to be in critical for at least 2 minutes hence I set for 180s.
Now, here is the real situation: What happens when I receive both trap messages with status UP and DOWN for a device at the same time? What will the alarm value be?
Hi Zhing,
At some point your messages will not be processed concurrently. It's already unlikely that they'll arrive at exactly the same time through the wire and UDP socket, I'm pretty sure that SLSNMPManager will receive these through a single thread from the UDP socket. So a certain order will be established within DataMiner, if not here, then whenever the value needs to be evaluated to ensure data integrity. Also, the device won't be sending UP and DOWN at the same time, perhaps with the same timestamp due to its resolution (milliseconds).
So the question becomes, can the UP and DOWN message be received out of order? i.e. the known state is UP, and the device sends DOWN and then immediately UP again. We don't want to end up receiving UP and then DOWN and get 'stuck' in an alarm state while the device is UP.
Depending on the network complexity between DataMiner and the Device, this is a possibility. When using the mapAlarm feature (as described in Traps | DataMiner Docs), then SLDataMiner maps the bindings to a certain alarm. If this happens in the wrong order, then yes, the alarm state may not correctly reflect the device's state as far as we can tell. Assuming the timestamp is the same, there should still be a request ID, for which I couldn't find any out-of-order checking code. But do note that this isn't as straightforward as it seems, as these values eventually wrap around, can be skipped, etc.
So yes, I think it is possible that the alarm state may be wrong if you push this case to the limits, but I don't recall having heard of cases where this actually happened. If this device does have that concern of toggling states very quickly, and the alarm being a process critical state, then I think mapping these traps to a protocol parameter and forcing a validation through a polling would be a safer implementation. Especially since you have a certain interval during which the state may go DOWN again without creating a ticket. Details on how to trigger on traps is explained in this chapter of Traps | DataMiner Docs. There is an important remark on how to actually trigger on them near the end of the chapter.
Hi Zhing,
First of all, I am not sure if you will receive the two traps messages exactly at the same time. What could happen is that the clear trap (in your case the 'down' status) will come right after the the alarm trap (the 'up' status).
Keep in mind that traps are sent via UDP protocol. This means that the SNMP agent (in this case running in the device) does not receive an acknowledge* if the trap message was received by the SNMP Manager (in this case DataMiner). Some devices even send the same trap more than once.
Furthermore, traps could arrive in different order (first you receive the clear trap and then the alarm trap)
Here what it is important is the timestamp available in the trap. The device should include in the trap the timestamp when the event was raised/cleared. In that way, the connector (driver) can process the trap correctly.
For your use case, is it not an option to use hysteresis 'on' instead of 'off'?
For example, if you set the hysteresis 'on' to 180s, an alarm will be generated if the value that triggered the alarm don't change for 180s. However, if you receive the clear trap before 180s, the value will be changed and no alarm will be generated.
Hope it helps!
*For acknowledgement, there is an exception when you send an INFORM message. In this case the SNMP agent receives an acknowledge when the trap is received.
Hi,
I see that this question has been inactive for some time. Do you still need help with this? If not, could you select the answer that was the most helpful?