How can I reduce the delay of 20 to 40 seconds when using correlations on critical alarm states across approximately 100 services to trigger automation? (the execution time of the automation is quite acceptable 1-2 seconds). Is it possible to prioritize correlation execution and automation, perhaps by utilizing parallel processing? Despite attempting to optimize by duplicating correlations and dividing services into smaller groups, the delay can still exceed 10-20 seconds in certain cases. Considering the significance of automation and the multitude of alarms that can trigger this logic, it remains a critical operational concern. Has anyone encountered a similar issue, and do you have any ideas or suggestions for improvement?
The first goal here should be identifying where the delay originates from. The SLCorrelation.txt logfile provides information on when incoming alarm events get analyzed by the correlation engine and when automation scripts get executed.
(you might need to increase the log level for Correlation to "Log Everything"/5 to see these typed of "Handling Alarm: xxxxx" DBG|5 type log events)
From there, you should be able to see at which stage the delay occurs:
- Between creating the alarm in DataMiner and having it handled by the correlation engine
- or Between handling by the correlation engine and launching the automation script
- or Somewhere else
After zooming in on where the delay occurs, next step would be finding clues on why the delay is there.