Question

Solved3.16K views26th October 2022alarm history MaintenanceSettings.xml

4

Michiel Saelen [SLC] [DevOps Enabler]5.69K 21st October 2022 2 Comments

With SRM we have a lot of alarm updates due to the view/service impact that changes at the start and end of bookings. Due to this, we get a lot of notices that the alarm history exceeds 100 alarms. We have a Cassandra and Elastic Cluster DB, so the alarms are now stored in Elastic and we have the default threshold for alarms per parameter. Is it now safe with Elastic DB to set the 'recurring' attribute to false to prevent us from having all those notices or is there still a problem with having such large alarm trees?

Michiel Saelen [SLC] [DevOps Enabler] Selected answer as best 26th October 2022

Alberto De Luca [DevOps Enabler] commented 24th October 2022

If this is normal within the standard cycle of the booking, could it work to log this as an info event, rather than an alarm?

Michiel Saelen [SLC] [DevOps Enabler] commented 24th October 2022

Hi Alberto,
Information events and alarms are very similar, the main difference is that information events mostly don’t have a tree and are just single events. There is a lot of value in having the updates in the alarm tree as with this we can also track at what time what views/services are impacted by retrieving the alarm tree (by retrieving the history of the alarm you can track back these things). The problem with large alarm trees goes back to when alarms were still stored in Cassandra and large alarm trees would have a negative impact on the performance of the system. Now that alarms are stored in Elastic, I’m wondering if large alarm trees are still a problem. If it would not be a problem anymore we can disable the notifications that warn us about this.

2 Answers

You are viewing 1 out of 2 answers, click here to view all answers.

If this is normal within the standard cycle of the booking, could it work to log this as an info event, rather than an alarm?
Hi Alberto,
Information events and alarms are very similar, the main difference is that information events mostly don’t have a tree and are just single events. There is a lot of value in having the updates in the alarm tree as with this we can also track at what time what views/services are impacted by retrieving the alarm tree (by retrieving the history of the alarm you can track back these things). The problem with large alarm trees goes back to when alarms were still stored in Cassandra and large alarm trees would have a negative impact on the performance of the system. Now that alarms are stored in Elastic, I’m wondering if large alarm trees are still a problem. If it would not be a problem anymore we can disable the notifications that warn us about this.

score 1 · Answer 1 · 2022-10-21T14:33:14+00:00

1

Ive Herreman [SLC] [DevOps Enabler]13.88K Posted 21st October 2022 1 Comment

Hi Michiel,

Not a direct answer to your question, but did you have a look already at the alarm squashing feature? This feature allows you to group consecutive alarm events without a severity change into a consolidated event.

Link to docs

Michiel Saelen [SLC] [DevOps Enabler] Posted new comment 21st October 2022

Michiel Saelen [SLC] [DevOps Enabler] commented 21st October 2022

AFAIK alarm squashing feature will only reduce the load on SLNet and Cube, but will still keep all entries in DB. So the question would still remain if we can set then maybe with or without the squashing feature the ‘recurring’ attribute to false.

Can the alarm history exceeds 100 alarms with elastic DB?

2 Answers