Question

Solved1.18K views19th July 2023Automation correlated alarms Correlation Correlation rule

3

Carlos Morales [SLC] [DevOps Member]168 13th July 2023 2 Comments

How can I reduce the delay of 20 to 40 seconds when using correlations on critical alarm states across approximately 100 services to trigger automation? (the execution time of the automation is quite acceptable 1-2 seconds). Is it possible to prioritize correlation execution and automation, perhaps by utilizing parallel processing? Despite attempting to optimize by duplicating correlations and dividing services into smaller groups, the delay can still exceed 10-20 seconds in certain cases. Considering the significance of automation and the multitude of alarms that can trigger this logic, it remains a critical operational concern. Has anyone encountered a similar issue, and do you have any ideas or suggestions for improvement?

Marieke Goethals [SLC] [DevOps Catalyst] Selected answer as best 19th July 2023

Toon Casteele [SLC] [DevOps Enabler] commented 13th July 2023

Could you add some more info to your question on how the DMS is configured? Is correlation happening across agents, how many agents are there, what’s the detection…?

Carlos Morales [SLC] [DevOps Member] commented 13th July 2023

Thanks , below more details:
It is a standalone DMA, with about 150 services which are divided into Radio and video services, the operation is not very complicated and what it seeks is to mask those alarms that have previously been charged as maintenance failures or foreseen by the operator.

The correlation is triggered with each critical alarm (filtered by Service Name + Critical) and using an automation it sends the alarm information to a protocol that contains the list of programs to know if it is a real alarm or not. at this moment the alarm is masked.
The critical time is from the moment the alarm is captured by Dataminer until the moment the automation starts.

1 Answer

You are viewing 1 out of 1 answers, click here to view all answers.

Could you add some more info to your question on how the DMS is configured? Is correlation happening across agents, how many agents are there, what’s the detection…?
Thanks , below more details:
It is a standalone DMA, with about 150 services which are divided into Radio and video services, the operation is not very complicated and what it seeks is to mask those alarms that have previously been charged as maintenance failures or foreseen by the operator.

The correlation is triggered with each critical alarm (filtered by Service Name + Critical) and using an automation it sends the alarm information to a protocol that contains the list of programs to know if it is a real alarm or not. at this moment the alarm is masked.
The critical time is from the moment the alarm is captured by Dataminer until the moment the automation starts.

score 5 · Answer 1 · 2023-07-14T08:20:49+00:00

The first goal here should be identifying where the delay originates from. The SLCorrelation.txt logfile provides information on when incoming alarm events get analyzed by the correlation engine and when automation scripts get executed.

(you might need to increase the log level for Correlation to "Log Everything"/5 to see these typed of "Handling Alarm: xxxxx" DBG|5 type log events)

From there, you should be able to see at which stage the delay occurs:

Between creating the alarm in DataMiner and having it handled by the correlation engine
or Between handling by the correlation engine and launching the automation script
or Somewhere else

After zooming in on where the delay occurs, next step would be finding clues on why the delay is there.

Improve the execution time of a correlation

1 Answer