For which parameters should alarm monitoring and trending by default be enabled so that this information can be used when an issue (RTE, memory leak...) occurs?
Is there a place where those default templates are stored?
Hi Jens
As mentioned by Michiel already, alarm templates is based on the system and on the preference. He already told you the typical trended parameters used in QA.
In IOC, we also created something similar in the form of a Manual Of Procedure: MOP - Monitoring DataMiner Health
Alarming is a matter of preference and also greatly depends on the capacity and load of a system. Therefore there are no fixed or recommended alarm templates that I'm aware of.
The trending we use for leak and issue detection during quality assurance is:
- Performance page
- Commit charge total
- Free Virtual Memory
- Total processor load
- Total threads
- Task Manager:
- CPU
- Handles
- Process Pid
- Threads
- VM size
- Filters for Task manager items above:
- SL* for all Dataminer Processes
- mysql* if a mysql database is present on the system. This could als be a Cassandra system with a mysql database for e.g. Asset manager.
- prunsrv* for Cassandra db
- Elasticsearch* for Elasticsearch
- Additional task manager filters if you also want to monitor clients on the system:
- iexplore*
- *presentationhost*
- Disk info (for all disks, or filter on disks used for DataMiner and databases)
- Avg. Disk sec/Transfer
- Disk Usage
- Free space
- Percent busy time
Note that these only monitor the DataMiner related processes. In the past we have occasionally seen memory leaks from other software running on the same system as the DataMiner agent, eventually also causing issues with the DataMiner agent because insufficient memory was available.
This template is also available as part of the SL_SystemHealthCheck protocol package specific designed to work with the protocols designed to detect memory leaks.
w3wp* for WebAPIs and legacy Dashboards