Hi all, my DMS is running slowly and I've noticed that information events are delayed by over 2 hours behind real time. Rather than just restart, (the server had a restart 6 weeks ago) I'd like to know what's causing the issue. I've noted the following from Agents/BPA messages... (Cassandra DB Size) 'Large partitions detected for table elementdata Partitions: [partitionkey: 31148:26102 - size: 131.817MiB] Large partitions detected for table infotrace Partitions: [partitionkey: Unknown (reported by Nodetool) - size: 765 MiB]' This makes sense to me as the Element does have very large tables and the data is changing constantly. I've since deleted the Element and re-run the Cassandra BPA tool in system center, the message still exists.
Is there anything I should be checking apart from this? Are there any tools in DM to clean or refresh old files to keep the agent in better condition? Are there any files I can manually delete to help? Where can I check how much alarm, trend and parameter data is stored? Maybe this could be reduced to help? Any other best practices?
Thanks!
Ross
Hello Ross,
It sounds like you have a system that generates too much information events. I would try and identify where the flood of information events comes from by inspecting the information tab in DataMinerCube alarm console. You can potentially already isolate or detect frequent culprits like elements, correlation rules, actions that are causing these and then in turn try and reduce the frequency.
From the BPA you can see infotrace is big, this is basically the history of the information events.
If your alarm console is lagging behind it means the different processes handling them can't keep up with the flood of information events.
Hi Ross,
I'm replying somewhat late to this question, but I would like to clarify that the symptoms you observed - information events arriving up to two hours late - are not likely to be caused by the DB size BPA warnings you mentioned. From experience, even the infotrace partition of 3 GB has little, if any, impact on DMA performance.
The issue with delayed information events is not something we come across often. If it's still present, it would be best to report this issue to Techsupport for a proper investigation.
As for your additional questions:
- DataMiner Size tool | DataMiner Docs can be used to identify elements with very large tables or large number of trended parameters.
- This documentation page has some general advice on DMA cleanup. Some of this information might be relevant to your questions: Keeping a DMA from running out of disk space | DataMiner Docs
- In general, manual deletion of any files is not advised without a proper investigation.
Hi, thanks for the response it’s very much appreciated. I’m still getting the infotrace alert in BPA but my system isn’t generating many alarms or information events at all. Less than 10 per hour usually. Is there any way I can manually delete these large files without affecting operation? Also, it would be good if I could see what was causing it but there’s not link to processes or Element ID etc so I’m not sure what the issue is. I had to restart the agent for another reason last week but it’s already running 15 minutes behind for information events.