Hi,
A user is running a DMS consisting of 6 failover-pairs.The DMA version is 9.6.0 CU23.
The DMS is monitoring a few hundred CMTS's and these devices are configured to send traps to each and every DMA (i.e. all 12 DMAs are trap destinations). Recently it has been discovered that the number of traps being sent is causing congestion on the management network. The user would like to leverage the SNMP trap distribution feature (RN15802) on 9.6.0 and reduce the number of trap destinations per CMTS. The trap destinations needs to be the same across all CMTS's.
The current thinking is to reduce the trap destinations from 6 failover-pairs (12 DMAs) to 2 failover-pairs (4 DMAs).
Before moving forward with re-configuring all the CMTS's to send traps to a reduced number of DMAs, it would be prudent to clarify some point:
First some yes/no question:
(1) Is the SNMP trap distribution feature enabled by default?
(2) Are traps sent by a device to the standby DMA in a failover-pair simply ignored?
Some more complex questions:
(3) If the element sending the trap is on DMA_failover_pair#2, and the element is configured to send traps to DMA_failover_pair#1 and DMA_failover_pair#6, is active DMA in DMA_failover_pair#2 expected to receive two re-distributed traps?
(4) On Wireshark, what do these re-distributed traps look like?
(5) Is there a best practice / rule of thumb when using the SNMP trap distribution feature on 9.6.0? What do we need to avoid when using SNMP trap distribution feature on 9.6.0?
Hi Bing,
First, to answer your direct questions:
- Yes, the feature was introduced with RN18640 in FR 9.5.13 and MR 9.6.0 (CU0) and is always active
- Yes, the standby agent will either not listen to the trap port, or not send out the traps as this would conflict with the active agent. We don't know from the top of our heads if it's the port being ignored, or the messages dropped in SLSNMPManager.
- Yes, the DMS will not know that it has received the same trap twice and process both of them. It will find pair#2 in your example as the one hosting the destination element and receive the distributed trap from both agents. However, agents not hosting an element with a polling IP matching the source of the trap will not receive the distributed trap.
- The trap distribution is part of the internal dataminer communication, it will not generate an SNMP trap message. This will look like any other message between two agents in a cluster, this is currently done using the (configurable) port 8004 with .net remoting (see the dataminer help: General DMA configuration - Configuring the IP network ports).
- There aren't any rules besides knowing the behavior discussed above.
Second, some thoughts and suggestions:
- When a failover pair is set up, there's a virtual IP that will redirect to the active agent. When configuring SNMP traps, this IP can be used as a trap target to contact the active agent. This should halve the amount of IPs you configure.
- I'm personally not aware of users forwarding traps to multiple agents in the cluster for redundancy purposes, just know that your drivers will receive the same trap multiple times, potentially triggering certain actions twice.
- If you expect to receive a lot of traps in short amounts of time, try to load-balance them between different agents, both element-wise and trap-target-wise.
- Usually, traps are forwarded to the agent hosting the element, since this will have the least overhead. When it's hard to track which agent is hosting the element, or all devices need the same configuration, or that particular agent is not set up to receive traps, is when you'll want to rely on the trap-distribution feature.
Hi Carlos,
Point #3 and suggestion #2 apply here. If each pair receives the same trap, they will all forward it to the agent hosting the intended element. (i.e. 5 traps are forwarded, since one of the pairs would be hosting the element) The hosting agent will provide all 6 traps to the intended element.
Traps don’t have a unique identifier, nor do we retain them to compare the recent ones with those that enter, so none of these would be dropped.
Since SNMP Traps are sent using UDP, there is a chance of these getting dropped by the network and not arriving at the destination. I presume this is why users want to configure multiple targets as a way of redundancy. However, the correct way to configure this would be to use SNMP Informs with one target agent. Informs must be acknowledged by the receiver (the DMA in this case), or they’ll be sent again.
A notable addition to Informs is RN29034 (10.1.0[CU2] / 10.1.4), which will drop duplicate informs sent to the same agent (before it is distributed, so sending it to multiple agents will still make it arrive at the element multiple times).
Hi Floris,
I have a quick question regarding this topic, for this user having the DMS consisting of 6 failover-pairs (12 agents in total) and the DMA version being 9.6.0.
If this user sends the traps to all the DMA pairs, would this mean that with the SNMP trap distribution feature, all the DMAs are searching trying to find the DMA pair where the element (that sent the trap) resides and when they find it, the DMA pair already knows so it ignores the requests?
Thanks!