For which parameters should alarm monitoring and trending by default be enabled so that this information can be used when an issue (RTE, memory leak...) occurs?
Is there a place where those default templates are stored?
Alarming is a matter of preference and also greatly depends on the capacity and load of a system. Therefore there are no fixed or recommended alarm templates that I'm aware of.
The trending we use for leak and issue detection during quality assurance is:
- Performance page
- Commit charge total
- Free Virtual Memory
- Total processor load
- Total threads
- Task Manager:
- CPU
- Handles
- Process Pid
- Threads
- VM size
- Filters for Task manager items above:
- SL* for all Dataminer Processes
- mysql* if a mysql database is present on the system. This could als be a Cassandra system with a mysql database for e.g. Asset manager.
- prunsrv* for Cassandra db
- Elasticsearch* for Elasticsearch
- Additional task manager filters if you also want to monitor clients on the system:
- iexplore*
- *presentationhost*
- Disk info (for all disks, or filter on disks used for DataMiner and databases)
- Avg. Disk sec/Transfer
- Disk Usage
- Free space
- Percent busy time
Note that these only monitor the DataMiner related processes. In the past we have occasionally seen memory leaks from other software running on the same system as the DataMiner agent, eventually also causing issues with the DataMiner agent because insufficient memory was available.
w3wp* for WebAPIs and legacy Dashboards