Hello Dojo,
(DataMiner 9.6 CU16 running MySQL)
I was searching for some clarification as to these settings in the MaintenanceSettings.xml found on the help page below:
We have set this to false but it seems these items are still logged into the database regardless of this setting, and after reading more closely it seems this only affects client side, where the database entries will still exist every time the alarm exceeded 100 alarms.
How does this work in the software? Every alarm after the 100th alarm seems to create a new database entry with value equal to "alarm history exceeded 100 alarms".
- It seems the alarm storm is due to a configuration issue within the timeout settings of the communication state of the driver.
- The side effect of these 'alarm history exceeded alarms' in the service alarm table is actually worse than the alarm storm itself - see below:
- Counts were 2 million entries per day causing the service alarm table to grow past 36G.
- This caused many SLDataminer RTEs as a result of such large database tables and their DMA completely froze (unresponsive database due to many failed queries).
Is there a way to avoid an alarm storm event creating excessive "alarm history exceeded alarms" in the database?
Also note that the parameter ids (pids) in screenshots are all 64501, which is the "Timeout" parameter. Not sure whether this coincidence or whether all of these notices are being generated on this same parameter.
It might be useful to inspect the alarm trees that get generated on these parameters to see whether these trees really have that many updates and to know where the updates originate from.