For the below VM resource spec (1 for Cassandra and 1 for elastic), what will be the maximum SSD disk size that can be loaded for storage? And what will be the usable storage vs actual storage?
CPU: 1x16 core (10K passmark), RAM: 64 GB, OS: Linux Ubuntu LTS
Hi Naveendran,
It's hard to put 1 number on this as is has to many dependencies. Below you can find a general number for both databases.
For Cassandra:
Depending on how you configure it, as standalone Cassandra, or Cassandra Cluster with 1 node the data is saved a little different. However for this we follow the general rules from DataStax Cassandra, this to have a maximum of 2/3 TB per node in a cluster.
On the disk you should have space left same as the biggest table on your system. For example if you have lot of average trending, 400GB, then you need at least 400GB of free disk space so Cassandra can compact your data.
more information can be found here: https://docs.datastax.com/en/ossplanning/docs/oss-capacity-planning.html
For Elastic:
Here we follow the recommendations from ElasticSearch itself, with the configuration of the CPU and memory, this would mean a maximum disk of 1.9TB.
For the usable space, this is almost the whole disk, as Elastic has a different way of handling the data.
Naveen, I also suggest using https://community.dataminer.services/calculator/ for approximating the storage space needs.
In terms of the Elasticsearch, the user needs to consider the scenario of losing a node in the cluster. As Thibault mentioned, ES can use the complete disk in the way it works. However if 1 out 3 nodes goes down then the same size ES indexes need to spread between 2 nodes instead of 3. I couldn’t find the relevant elastic documentation for this, but I suggest to plan based on 30% free space.