Does anyone have some good steps for diagnosing SLDataGateway.exe taking all Virtual Memory and Crashing the DMA?
Please see some interesting log file entries from just before DMA crashing:
Windows successfully diagnosed a low virtual memory condition. The following programs consumed the most virtual memory: SLDataGateway.exe (5944) consumed 46393978880 bytes, prunsrv.exe (1296) consumed 7146168320 bytes, and SLElement.exe (6652) consumed 1616957440 bytes.
Windows successfully diagnosed a low virtual memory condition. The following programs consumed the most virtual memory: SLDataGateway.exe (5792) consumed 40352841728 bytes, prunsrv.exe (1224) consumed 7361466368 bytes, and SLNet.exe (2652) consumed 2814517248 bytes.
Thank you for looking.
As I stumbled upon this question a year later, I'd like to update this question with our later findings. Perhaps it can still be relevant to some users.
Two causes of high memory usage were found:
- Setting large volume of trending data via History Sets (iDirect Platform in History Polling mode).
History sets and related calculations are more resource intensive than regular trending. That causes build-up of queues in SLDataGateway.
This has been resolved by the feature NewAverageTrending, available in DataMiner 10.2.0 and newer. - Replaying large volumes of data from the Offload file.
When a database is unavailable for some time, the new data is being written to a temporary file in C:\Skyline DataMiner\Offload\. When the connection to the database is restored, the data is pushed from the offload file to the database (so-called replay). This operation can be quite resource intensive and cause the build-up of queues in SLDataGateway.
The best practice would be to avoid database outages and monitor the health of the database, e.g., with the protocol Apache Cassandra Cluster Monitor.