I have a correlation rule for iLO/iDRAC issues spanning across multiple sites. We have 8 DMAs in a cluster and each DMA is responsible for that site's elements. Each site has it's own correlation rules looking for alarms. All sites have the same rules, they're just looking at different views or elements.
On the correlation rule, I have the alarm filter section looking at a specific View (IS) (all that sites elements are under that root view) which then filters only elements using the (AND) iLO or (OR) iDRAC protocol. In the rule condition I'm looking for any (IS) critical severity.
The problem is when this correlation rule is triggered, it's being triggered at all 8 sites instead of just that one so all 8 locations get an email for that sites alarm. What looks to be happening is it's either ignoring or not correctly evaluating the Alarm Filter section. It's being triggered on the correct elements and alarms, I just don't understand why when it does get triggered, it's triggering on all 8 DMAs even though the other correlation rules are looking at different views.
Hi Jeff,
So, it sounds like you have multiple correlation rules, one per View (site)? Is that correct? I assume the email in the action is setup to be sent to the appropriate site, right?
One thing that stands out is in the Alarm Filters you have an AND and an OR statement in your filter, but no grouping. I suspect the logic you are trying to accomplish is something like:
(VIEW = SomeValue) AND ([Param = iLO Active Sessions] OR [Param = iDRAC Amperage])
... meaning, you want to respond to issues with the iLO Active sessions or the Amperage Probe, but only for a certain view? Is that right?
If so, you probably have to build the filter more like the one I have provided so you can group the filter conditions better. You can do this with the other grouping options like these:
But, without more explicit grouping, I can see how the CR might trigger on multiple sites as the view could get ignored for the iDRAC issues.
If I misunderstood your question, let me know and I can take another look.
(VIEW = SomeValue) AND ([Param = iLO Active Sessions] OR [Param = iDRAC Amperage])
This appears to have corrected it. When I run a test on the rule I’m no longer getting other sites listed.
Thanks for the help.
Glad to hear it!
Hi Jeff,
I think you have an error in the alarm filter logic and that's why the rule is being triggered in all the DMAs of the cluster for the iDRAC alarms.
You need to use the "AND (" operator and not the "AND".
In pseudo-code, what you have is:
if( (view == SOMEVIEW _and_ alarm == ILO) _or_ alarm == iDRAC ) ...
when you want this:
if( (view == SOMEVIEW _and_ (alarm == ILO _or_ alarm == iDRAC) ) ...
A final advice, you should move the Rule Condition up to the Alarm Filter, resulting in something like:
if( (view == SOMEVIEW _and_ (alarm == ILO _or_ alarm == iDRAC) _and_
severity == Critical ) ...
This will reduce the number of the internal alarm buckets created, thus improving the correlation engine performance.
Hope this helps.
>> if( (view == SOMEVIEW _and_ alarm == ILO) _or_ alarm == iDRAC )
Yeah, that would explain why it would pickup the iDRAC alarm but not in the correct view as well as it triggering all correlation rules to send emails.
Hi Jeff,
I think it is related to this option:
General configuration of Correlation rules
Hope it helps.
Miguel,
That option is not selected on our correlation rules. Each site has a copy of this rule, only difference is the View portion is set for that site. I have other correlation rules using the same type of filtering and those work fine, it’s just this rule that’s having issues.
Jamie,
Correct. Multiple correlation rules for the iLO/iDRAC alarm templates, one per View (site). Each rule has an email action sending a notification to a specific distribution list. Our issue is when we find a critical alarm, ALL sites are getting emails rather than just the single site where the server that triggered the alarm lives so the View is definitely getting ignored yet the rest of it is applying as we are getting the emails from the CRs.
I’ll rebuild the filter portion like you mentioned and see if it corrects it.