Hello All,
We've received a report that during an alarm storm, users were unable to make configuration changes to their devices via Visio shapes that launch automation scripts. In this case, the Visio button has an "execute" shape data with a tooltip but no reference to any alarm.
Under normal working conditions, when the button is pressed, an interactive window is displayed asking for confirmation for the MUX switch.
Here's a description from an operator of what happened after the alarm storm cleared. He reported that all confirmation prompts for the script came up simultaneously once the alarm storm ended:
“During the lock up in DataMiner last night the program itself looked normal, and I was able to see the Mux switches but as I tried to click on a particular mux switch there would be no reaction from the program. I do recall when it first froze and during the first attempts on switching a pop-up text box did appear. I don't exactly recall what it said with all the commotion last night, but it was something to the effect of "wait for alarms to clear to resume function" I got the understanding that I could not do anything until things cleared up. When alarms did actually clear up, all my switch attempts came up all at once with several pop-up text boxes asking if I want to complete the switch for the particular mux I wanted to switch but it was way after I requested to do so.”
Is this behavior expected?
If so, is there a workaround to prevent alarm storms from affecting automation script execution?
Clients use eventing. And i dont think it was unresponsive as the operator could still click the visio button which ultimately allowed for the stacking of all the confirmations to run the script when the storm cleared. If it was a unresponsive program, i do not believe visio would still be interactive.
From what I understand, confirmations were not displayed till the storm cleared?
That sounds like a frozen client or similar, perhaps due to the storm.
By the way, it would be interesting to check if you can reproduce the behaviour for your squad to take a closer look at the info that the automation needs to parse after the button has been clicked (Visio is just the graphic part).
Alarm storm mode can be disabled in the client settings, but it should also be noted that the device receiving the command can become slower under certain local conditions (high CPU due to several TS lost).
Generally speaking, with MUX switches, my preference is to handle automatic switching within the device itself and provide the operators with a shortcut via GUI in case the switching needs to be operated manually.
Many TS switch models offer this as a built-in functionality based on TR290 error priorities: e.g. for a sync loss or CC errors, it’s a no brainer –> I wouldn’t wait for an operator having to action the switch manually.
HTH
Hi,
based on the indications and the user story, I assume that the event messages from SLNet to Cube were throttled on the connection (wire). So after when all events of the alarms were handled, then the event messages of the script came in, which then shows the multiple messageboxes. I also assume then that everything went normal and smooth again after the throttle.
Note that alarm storm mode is sort of protection mode of the (huge) alarm event message flood that comes in.
More info: https://docs.dataminer.services/user-guide/Basic_Functionality/Alarms/Advanced_alarm_functionality/Alarm_storm_protection.html
Hope this will help you all further
Hello Matthias, its sounds like the described behavior could be expected then. Do you know of a workaround to be able to get the interactive automation window to display during an alarm storm? Or can we expect these interactive automations to always land at the back of the queue regardless of if they are executed from Visio or elsewhere?
There's no workaround, afaik. But I assume the burst of alarms is temporary for a brief moment.
Hi Thomas,
Can you share if your Cube clients were using polling or eventing mode?
Subscribing to get more insight by other members of the community – I believe the “Storm” mode might be used to prevent this sort of behaviour (by limiting the amount of info ingested in a CUBE client), but if the script has to be manually triggered (I understand your GUI had a button on a Visio) and the client has become unresponsive, then operators may have to terminate & relaunch Cube to gain control.
Asking about your set-up as “polling”-based Cube clients seem to be more affected by alarm storms.
HTH