Hi Dojo,
I've encountered a scenario I'd like to get some advice on please.
An SNMPv3 device which has been polling normally goes into timeout, streamviewer produces discovery failed error messages. In Wireshark you can see that Dataminer's discovery packet receives no response from the device.
3rd party tools (QA Device Simulator) you can see the discovery packet receive a response when manually polling the device.
Restarting the element, the next discovery packet yields a response even though the SNMP payload is identical to the 'ignored' packet before element restart. Polling is re-established.
Log files don't seem to point to anything obvious.
Does anyone have suggestions as to what could be happening here?
In error state, we’re not getting a response back from the device.
The response packets from the 3rd party tool request and when the element is restarted are the same (Both are usmStatsUnknownEngineIDs responses) with only expected differences (engineTime, request ID and binding value).
I see Nicolas’ response but I have seen this behaviour before and it might be worth checking if the device is conforming to the SNMPv3 standard (Section 3.2) (https://datatracker.ietf.org/doc/html/rfc3414#section-3.2)
When a time-out occurs Dataminer is trying to rediscover the msgAuthoritativeEngineID. All subsequent discovery payload (get-requests) are sent without a msgAuthoritativeEngineID and the device should respond with a usmStatsUnknownEngineIDs pdu so that communications can be re-established.
When an element is restarted, the engineTime and engineBoots are reset to 0 essenially a reset in communications so the the discovery payload and the ‘ignored’ packet will be slightly different in that aspect.
Hi Wale, I can see that the engineTime and engineBoots are both at 0 for the ignored packet and the responded packet post element restart. The only change is the UDP source port after element restart.
Hi,
This is a known issue with SNMP communication.
The current suspected issue is that the library used for SNMP communication (winSNMP) has a bug where it swallows the packets but does not pass it to the DataMiner code.
We are still investigating when this occurs and how to prevent this.
Hi Nicolas, i’m not seeing an incoming response from the device in a packet capture before a restart. Surely we would see the swallowed packets should this be the same issue?
Our theory was that, with WinSNMP, windows is somehow rejecting the response message and prevents it from showing up in Wireshark. Although the communication is over UDP, there is an in-memory object from WinSNMP that facilitates that specific endpoint, and restarting the element creates a new object.
That being said, SNMPv3 uses SNMP++, as WinSNMP only supports v1 and v2, so the scenario described in the original question seems to contradict the explanation we’ve built regarding this problem. It requires further investigation, but we also haven’t been able to trigger the problem in a reliable manner, nor have we been able to reproduce it locally in any way.
Are both response packets identical?