How many alarms (per second) can a DataMiner system handle?
It is known and available on DataMiner help the amount of concurrent active alarms that a DataMiner Agent and a DataMiner system can handle. It is also available on DataMiner help, information about the Alarm storm protection, and how to configure it.
Although, the question here is tied with the alarming rate on a DataMiner Agent. Is there any mechanism available to predict how many alarms we can ingest in a DMA (DataMiner Agent)? Worth to be mentioned that the system resources might have a direct impact on such capability.
We have an internal tool that generates an increasing amount of alarms/s until we see that the queues can't handle the load anymore (we only measure on Cassandra at this point though). Afterwards the results are returned to the user. The alarms generated are "lightweight", so do not contain alarm properties etc., so we get a theoretical "maximum" rate at which alarms can be generated without anything blowing up.
On a i7 2.6Ghz, 32GB ram, 64-bit operating system we saw we could generate around 240 alarms/s this way.
Note that it is perfectly fine to generate alarms at a higher rate, as long as it is temporary. This numger only gives the theoretical maximum rate at which alarms can be created continuously.
We use this number to check for regression on our system. I.e, to check if the rate at which we can generate and process alarms does not go down over time when changes are made to the code. We do not care about the value itself, only about evolution over time.
This tool is for now only used within Dodo squad, but we plan on making it public in the future, so you'd be able to check this rate on your own machine.
Looking at the rate you should consider 2 limits
database speed:
On a MySQL system the speed of parsing the queries will be your bottleneck
For a Cassandra setup the speed of your disk will be the limit. e.g. SSD disks will be able to parse a much higher volume of alarms than spinning disks.
database volume:
Depending on where, how and how long you store the alarms this could be a limit as well. For a MySQL system we cap the amount of alarms but for Cassandra we don't do this since everything is based on a Time To Live.
You have to ensure that the maintenance actions done on Cassandra can still cope with the volume of data.
To have more specifics on this last topic:
The size of an alarm mainly depends on the amount of properties you add to an alarm: taking into account element, service and alarm properties
Currently an average DMA on the field has +- 20 GB of timetrace data (stored for 1 year). This way, if you would calculate the size of one typical alarm on your specific setup you can determine what the rate could be to correspond to an average system.
If a significant larger rate is needed you can always look into more external Cassandra nodes, better hardware, etc but to know the correct solution a detailed analysis is necessary.