Question

Solved1.89K views19th July 2023disk Microsoft Platform monitoring

1

Miguel Barquet [SLC] [DevOps Enabler]1.65K 10th December 2020 1 Comment

Hi,

We use the Microsoft Platform to monitor several metrics of the Windows servers. Customers have asked what are the parameters that should be monitored in order to detect hard disk issues proactively and also the recommended thresholds.

Note 1: Keep in mind the screenshots below only show the parameter related to Disk. Other parameters might also need to be monitored but that is outside the scope of this questions.

Note 2: We used to monitor the Percent Busy Time but decided not to do that anymore since its value is linked to the RAID configuration and can vary greatly, going in some cases above 100% without that being indication of any problem.

After some internal discussion, we came up with the following template. I would like to know the thoughts of the community and if you recommend further tweaking.

Thanks in advance.

Marieke Goethals [SLC] [DevOps Catalyst] Selected answer as best 19th July 2023

Miguel Barquet [SLC] [DevOps Enabler] commented 10th December 2020

Some information online indicate Idle Time might also be a good metric to use however it is currently not included in the driver. Do you think adding that parameter to the driver would be a good idea / feasible?

% Idle Time

This counter provides a very precise measurement of how much time the disk remained in idle state, meaning all the requests from the operating system to the disk have been completed and there is zero pending requests.

This is how it’s calculated, the system timestamps an event when the disk goes idle, then timestamps another event when the disk receives a new request. At the end of the capture interval, we calculate the percentage of the time spent in idle. This counter ranges from 100 (meaning always Idle) to 0 (meaning always busy).

3 Answers

You are viewing 1 out of 3 answers, click here to view all answers.

Some information online indicate Idle Time might also be a good metric to use however it is currently not included in the driver. Do you think adding that parameter to the driver would be a good idea / feasible?

% Idle Time

This counter provides a very precise measurement of how much time the disk remained in idle state, meaning all the requests from the operating system to the disk have been completed and there is zero pending requests.

This is how it’s calculated, the system timestamps an event when the disk goes idle, then timestamps another event when the disk receives a new request. At the end of the capture interval, we calculate the percentage of the time spent in idle. This counter ranges from 100 (meaning always Idle) to 0 (meaning always busy).

score 1 · Answer 1 · 2020-12-15T17:36:44+00:00

1

Michiel Vanthuyne [SLC] [DevOps Enabler]4.25K Posted 11th December 2020 1 Comment

Keep in mind that on systems with very large Cassandra tables, you may need to reserve more free space for the compaction to be able to run, but as a general template I think this is good.

Miguel Barquet [SLC] [DevOps Enabler] Posted new comment 15th December 2020

Miguel Barquet [SLC] [DevOps Enabler] commented 15th December 2020

Thanks for your comments.

How to monitor hard disk using the Microsoft Platform driver?

3 Answers