Hi,
I'm looking for an option that sends an alert in case a specific metric reaches a certain number of spikes beyond the threshold in a certain amount of time.
For example:
CPU usage threshold is 70%, beyond that the severity is critical.
In order not to get flooded with alerts (I'm aware of storm prevention though), I'm looking to configure that an alert will kick-in in case the metric value exceeded the threshold 5 times in 1 minute. I haven't found such option while using the platform.
I would appreciate your assistance,
Shaddad.
Hi Shaddad,
As Toon mentions, the correlation engine is an excellent option for counting events and triggering or escalating an alarm when the number of occurrences exceeds a predefined threshold.
Then again, for this specific use case of alarming on CPU, we typically apply a hysterises into the alarm template. As you might not necessarily want to alarm when the CPU is just 5 times exceeding a limit, the use case could be that you want to be alerted if the CPU is high for a longer period of time, without any drops. That might indicate a process that is continuously running high.
So, another possibility is to put a hysteresis of e.g. 60 seconds to the CPU parameter. Only if the CPU keeps on running high for a minute or more, the alarm would be generated.
More information can be found on
Configuring hysteresis in an alarm template | DataMiner Docs