Hello,
Usually we have a lot of elements with a lot of parameters in their Alarm templates.
For some reasons we have a scenario, when the alarm storm may appears. Term "Alarm storm" in my concern means the SERVER-SIDE event, when we have a lot of alarms. Of cause, in this situation Dataminer Cube should display "Alarm storm" banner but in DMA itself we will have a lot of useless alarms for each parameters per each devices.
My task is to reduce a number of alarms in this case. Imagine, that I can detect, when the "Alarm storm" starts and ends. For example, I can run Automation script time-to-time and get a number of alarms, and if it more than defined - we are running into "Alarm storm" event. After this trigger fired - I will wait for a while and turn this trigger off and go out from "Alarm storm" event.
For this situation, I suggest to define a "important" and "not so important" parameters in all Alarm templates, and once I detect an "Alarm storm" event - I want to disable monitoring for "not so important" parameters. And when I will leave a "Alarm storm" situation - I should enable these parameters.
To do this, I can have two alarm templates for each protocol : "Default template" and "Storm template" and switch it in Automation script for each elements. But this operation is very long and costly.
But each alarm template has "Conditions" per each parameter. So, if it will be possible to use some Global variables in this conditions - it may help me to solve this task.
Is it possible to do? What is your opinion about this feature? I think it should be useful for many customers.
Interesting question.
My main concern about your suggestion is the fact that applying other alarm templates has an impact on the number of alarms and as such has an impact on the 'alarm storm' state itself, no matter how you define that 'alarm storm state' (I guess it will always be based on a number of alarms somehow). You can easily run into cycles of 'in' and 'out' alarm storm mode.
Still, the alarm storm case is an interesting one and looks a bit different in different situations.
May I ask, in your case:
- How does an alarm storm looks like? How many alarms would be active in an alarm storm?
- Is there any correlation between the alarms triggering the alarm storm? (e.g. alarms from devices in the same location, same type of alarm like timeout or something, same parameter but on different rows/elements, ...)
- Does your alarm storm follow a typical pattern like for example a first period when a lot of alarms are coming up, but also clearing immediately and then coming up again (values are being measured around the alarm thresholds).
Then having a second period where there are a lot of active alarms, but the number of alarms coming up/being cleared is normal.
And then finally, when the alarm storm is over, a period where a lot of alarms getting cleared, but shortly coming up again for a short time (again values around the alarm thresholds)?
- What is your main goal with this alarm storm detection? Do you believe DataMiner is suffering from the high number of alarms? Or is your main concern about the user experience, do you want to give the user a good view on the system and mask/consolidate everything related to the storm somehow?