Dear Community
We upgraded from mySQL to Cassandra (local on the DMA) and facing disk performance issues after some months.
A reboot of the two failover DMAs solves this issue, but we would like to fix the performance issue.
My investigations identified a poor disk performance (inspected by Windows Resource Monitor).
From DataMiner requirements at DataMiner Compute Requirements - DataMiner Dojo I can see, that at least two HDD or SSD are required. We have currently only one RAID1 protected HDD.
So the idea is to add additional disk(s) to each DMA.
Now we have the choice of the technology between HDD or SSD, what would you recommend ?Should we use the new disk for the commit log or for the data directories?
Should we protect the new disk with RAID-1 again?
(I've found a lot of comments in the web, that is not mandatory)
Any experience report, or advise is highly appreciated.
Hi Joerg,
We would recommend SSD. You can let Cassandra know (cassandra.yaml) if you are using spinning disks or SSD, this will improve how Cassandra writes to disk, but SSD will give you better performance. RAID-1 will give you protection for a disk failure, but you could also leverage the replication factor within Cassandra to ensure you have a copy of your data. You can also find this information on our Compute Requirements page.
As you are explaining that you are seeing problems on disk after a couple of months there are probably also some other things you might consider looking into. We can see typically these cases in which your Cassandra system might start to struggle with the load:
- Too many updates on saved parameters
This will result in many tombstones on the elementdata table and should be visible from the Cassandra logging. - Too many Real-Time trending points are being inserted
This will result in large partitions (this has been resolved with the new data model of Cassandra in which Elastic is also required, more info on the different architectures here). The large partitions may lead to heavy compaction and repair actions, so typically you will see that Cassandra is trying to get a lot of memory to parse these large partitions and could even start to work on page files. In this case, lowering the gc_grace_seconds on the data table could be an option. - Many alarm updates impacting many views and/or services
I would suggest sending a mail to techsupport@skyline.be, so that we can follow up if there could be a growing problem in your system.