Hello Dojo,
So I am currently working on a driver that needs to individually poll cable modem IP's in a table and does using a multithreaded timer in a QAction, and also performs some mapping using another table. The driver has worked fine with tables with less than 10,000 rows, but there is an element that now has 19,000 rows so after a couple of minutes, SLProtocol shoots up to 100% CPU usage and stays there.
The timer is configured to run every 15 minutes, so that means that it needs to process at least 21 rows a second, which it seems it can not do. So is there a way to configure that polling cycle in the driver so that if the driver is unable to handle that many rows, it can be increased to handle a smaller load?
And if not, would decreasing the thread pool be a viable option? Or would that simply postpone the problem as more threads get placed in the queue and eventually hit the max thread size?
The timer is currently set up as so:
<Timer id="1" options="ip:1600,3;each:900000;pollingrate:15,3,3;threadPool:300,5,301,302,303,304,305,30000;dynamicthreadpool:300;qactionBefore:1600">
<!-- every 15 min [900000]-->
<Name>cmFastTimer</Name>
<Time>1000</Time>
<Interval>75</Interval>
<Content>
<Group>1</Group>
</Content>
</Timer>
Hi,
That "each:900000" is a hardcoded value that cannot be changed at runtime. The [Timer base] setting only affects the <Time> setting as far as I'm aware, so that's not an option for multi threaded timers.
Decreasing the thread pool would not be a good option, if the element doesn't have the resources to do its work then it will add it to the waiting queue until the max is reached.
As SLProtocol also hits 100% CPU it means it can't handle the load (and increasing the thread pool also won't be an option).
That means that you have 3 options here:
- Improve the code so it can execute faster: identify bottlenecks in the code and look for alternatives of those bottlenecks
- Split up the load over more elements/ or DMA: if CPU uses 100% then it might be considered to add more hardware resources (a better CPU or add another agent)
- If above options are not desired then find a way to reduce the timer speed: add a "count" column and poll all the snmp data through the QAction via NotifyProtocol. If the QAction gets called check the count column and add +1 if the value is equal or above the count limit then further continue to execute the code else abort (when further executing the code then reset the counter value again). The "count limit" value depends on the number of rows in the table: when < 10000 then count limit is 1 (execute each time), when between 10000 and 20000 the count limit is 2: that way the timer speed will be reduced, there will of course be some overhead because the QAction needs to be started every time but by aborting asap without polling anything it will hopefully be enough to not use up all hardware resources and don't build up items in the waiting queue.