On a system that encountered many uncommon behaviors for some time, like losing trend data, constantly sending SNMP set data to a device, blank dashboards, System Center log files were blank in the cube, and an element with InvalidOperationException error: "Too many threads are waiting for access" after restarting the element. A QAction runs at start up and it sends a DMSMessage to get some general info about the DMA. This process crashed with the following:
System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.InvalidOperationException: Too many threads are waiting for access
at Skyline.DataMiner.Net.FifoSemaphore.Wait()
at Skyline.DataMiner.Net.Connection.HandleMessages(DMSMessage[] msgs, Int32 timeout)
at Skyline.DataMiner.Scripting.SLNetConnection.SendMessages(DMSMessage[] messages)
at QAction.Run(SLProtocol protocol)
--- End of inner exception stack trace ---
at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor)
at System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] parameters, Object[] arguments)
at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
at System.Reflection.MethodBase.Invoke(Object obj, Object[] parameters)
at CManagedScript.Run(CManagedScript* , Int32 iCookie, IUnknown* pILog, IUnknown* pProtocol, tagVARIANT* varParameters, tagVARIANT* varRowInfo, tagVARIANT* pvarReturn)
InnerException:
System.InvalidOperationException: Too many threads are waiting for access
at Skyline.DataMiner.Net.FifoSemaphore.Wait()
at Skyline.DataMiner.Net.Connection.HandleMessages(DMSMessage[] msgs, Int32 timeout)
at Skyline.DataMiner.Scripting.SLNetConnection.SendMessages(DMSMessage[] messages)
at QAction.Run(SLProtocol protocol)
Is there a limit to how many threads can access this module?
Is there a way to change this limit or what determines the limit?
For debugging purpose when this happens, is there a way to see what is in the queue for this module?
Hi Miguel,
Some background first:
SLScripting is the server side process which is responsible for executing QActions. These QActions can send requests to SLNet. The connection to send these requests through is shared between all of the QActions on the system.
There is a throttling mechanism which prevents more than 10 simultaneous requests to be handled at the same time. Any further request will have to wait for a free slot (e.g. when one of the 10 active calls completes).
The error "Too many threads are waiting for access" indicates that all 10 slots are currently in use and that another 990 threads are already awaiting access.
What probably is happening is that 10 requests are in progress for some QAction which either are hanging or for which the handling takes extremely long to complete. I would expect the errors in the element that was restarted to be a consequence rather than a root cause.
To see which requests are active, there are a few options:
- If still in there, the SLNet logfiles might show the most recent requests coming in from SLManagedScripting ("Incoming (SLManagedScripting ..."). Those would be requests that are stuck.
- If you're familiar with the SLNetClientTest tool, Diagnostics > SLNet > HangingMessageInfo might provide some further clues
- A memory dump of SLScripting and SLNet should be able to show which requests are going on in more detail