Hello Dojo,
(DataMiner 9.6 CU16 running MySQL)
I was searching for some clarification as to these settings in the MaintenanceSettings.xml found on the help page below:
We have set this to false but it seems these items are still logged into the database regardless of this setting, and after reading more closely it seems this only affects client side, where the database entries will still exist every time the alarm exceeded 100 alarms.
How does this work in the software? Every alarm after the 100th alarm seems to create a new database entry with value equal to "alarm history exceeded 100 alarms".
- It seems the alarm storm is due to a configuration issue within the timeout settings of the communication state of the driver.
- The side effect of these 'alarm history exceeded alarms' in the service alarm table is actually worse than the alarm storm itself - see below:
- Counts were 2 million entries per day causing the service alarm table to grow past 36G.
- This caused many SLDataminer RTEs as a result of such large database tables and their DMA completely froze (unresponsive database due to many failed queries).
Is there a way to avoid an alarm storm event creating excessive "alarm history exceeded alarms" in the database?
It should indeed only generate 1 entry per alarm
when the option is set to true this would be expected behavior, also note that systems that were installed many years ago, this setting was standard set to true, nowadays this is set to false because of the major overhead in the database.
So in case you see these amount of notices it's worth checking that specific setting.
If it's set to false then there are only 2 options:
1) DataMiner has trouble reading in this file/setting, that could be because someone changed a setting in this file that might be incorrect or didn't include the correct xml tags
2) this is a software issue and the behavior isn't correct in some corner cases.
Explaining the corner case how to reproduce these amount of notices should suffice to get a fix for this
In the meantime i would suggest to disable this notice entirely to prevent further impacting issues
Also note that the parameter ids (pids) in screenshots are all 64501, which is the "Timeout" parameter. Not sure whether this coincidence or whether all of these notices are being generated on this same parameter.
It might be useful to inspect the alarm trees that get generated on these parameters to see whether these trees really have that many updates and to know where the updates originate from.
Hi Ryan,
One of the questions that comes up when reading the post is why you would need 2million alarms/day or alarm updates?
If you don't need all of these entries I would advise you to review the alarm template to fit the needs of the operators that need to monitor the system as I doubt that they will look at so many alarms.
Next to that, it does indeed seem an issue that there is an entry created every time there is a new alarm entry on the tree.
Therefor you could create a task.