Hi Dojo,
I'm looking into ways to minimize the load on the correlation engine when multiple correlation rules are required in a big DMS cluster. Any recommendations are welcome.
I'd also have these specific questions: I hope some feedback can be gathered from the different admins in the community.
1) When the alarms to be correlated are all expected on the same DMA,
would it be beneficial to disable the "Correlate across DMAs" option?
And if so, would we automatically have the rule running on each DMA of the cluster (i.e. the related Brain.xml file) without a specific central collection point?
Alternatively, by leaving this enabled, is there any increased load on SLNet or specific usage of NATs ports when entries to be correlated are still within the same DMA (not necessarily the one hosting the rule)?
2) By including Info Events, would I risk creating too much load?
Or would this still depend on the "Alarm filter" an admin defines below?
3) Any examples when it makes sense to trigger on the "ActiveTreeStatus.JPG" rather than on single events?
4) When evaluating "Persisting Events", will the correlation need more memory as we increase the duration in this field? Pers_filed.JPG
5) Am I correct that even if the "Update base alarm" option is checked, the root time of the base alarm will not be modified?
Update_BaseEvent.JPG
Thank you all for the insight
Hi Alberto,
That's a lot of questions 🙂
I'll try to provide some feedback on all of them:
1) If all alarms that need to be combined for a correlation rule are coming from the same DMA, there's no need to enable "Correlate across DMA". The load for this case will be somewhat similar though, as each agent will still check every alarm it generates against the global filter of the rule, but none will need to be forwarded.
Communication going on between DMAs is purely SLNet traffic (Remoting or GRPC). No NATS communication is involved at the moment. Mainly notifications like "there's a new alarm one of your rules is interested in", internal subscriptions on these and requests to perform actions on remote elements. If the involved alarm events are all coming from the handling DMA itself, no remote communication is going on (all handling remains in the SLNet process).
Sidenote: correlation rules are not in brain.xml (that's a legacy correlation engine config file). The configs are in files inside the c:\Skyline DataMiner\Correlation folder
2) Including info events means extra alarm events that need to be checked against the alarm filter of the correlation rule. Unless you need information events to trigger a correlation rule, there's no point in enabling that option.
3) "Maintain active tree status" vs "trigger single events": Triggering on single events is beneficial for rules that do not have to combine multiple alarms events. Rules that monitor active tree status basically keep track of a filtered active alarm set, deciding on whether or not to execute or clear rules whenever something changes in there. This is useful for combining/grouping multiple base alarm events.
4) Increased duration should not have an impact on persistent event memory usage. Correlation mainly keeps track of an action that it would have to execute along with a timestamp when it can execute (if the conditions are still fulfilled by that time)
5) "Update Base Alarms" will make an update to the correlated alarm tree when the set of included base alarms changes. As it is an update of the same alarm tree, the root time will indeed stay the same.
Hope this helps!
Top notch, Wouter – many thanks for clarifying all this!