Is there a way to predict the resulting size of the database when migrating from MySQL to Cassandra? For example, if my MySQL database is 200G in size what should I expect the size be after migrating it to Cassandra? This information is crucial to ensure existing servers have significant SSD drive space to support Cassandra migration.
Thanks
Hi Jeff,
Cassandra will keep all updates, deletes, ... in memory and from the moment the memory exceeds a certain threshold it will push it to a ssTable on disk. The content of a ssTable will never change, the only way to reduce the size all ssTables are using is by creating new ssTables and removing the old one(s). This typically happens during compaction and depending on the configured compaction strategy (configured on table level) it will trigger at certain moments.
In other words, this question can't be answered with a mapping that 200G will map to 250G, it will highly depend on your system load and configuration.
Typically we see that Cassandra needs a lot of disk space if you have Real-Time trending enabled and if your parameters update frequently.
We typically tell users to scale (adding Cassandra nodes) when needed or to reduce/limit the load on the system (e.g. reducing the amount of RT trended cells or reducing the update frequency of those parameters).
There is also a big difference between the architecture you are going to migrate to. Do you intend to migrate to Elastic and Cassandra Cluster (one Cassandra/Elastic instance for your complete DMS) or is this for migration towards Cassandra Single (one Cassandra instance for every DMA)?
Hi Jeff,
Below is for the case of Cassandra Single (one Cassandra instance for every DMA or every failover pair).
Related to trending you will have roughly the same size. If you have a lot of RT trending, you might have very heavy compaction actions and it might take long before the RT trending is cleaned from the avg trending. FYI: This has been improved a lot with Cassandra Cluster.
For alarming you will need to be careful as in Cassandra we are purely working on TTL (there is no longer a hard limit as in MySQL). Would be good to validate what your oldest alarm is in your current MySQL system, then you can configure a TTL for your alarms in Cassandra that is safe in Cassandra. In Cassandra Single (this is moved to Elastic with new architecture), we store every alarm for every view, every service and the element itself (this for fast lookups).
If you want to know exactly how much data it will take in Cassandra I would advise to take a copy of your MySQL data, put it on a test system and migrate it to Cassandra to know the exact size.
Some interesting readings I would suggest:
https://community.dataminer.services/supported-system-data-storage-architectures/
https://community.dataminer.services/question/valid-dataminer-architectures/answer/84952/
Thank you for your response Michiel. None of our customer installations are Clusters, they are all 2 DMA Failover configurations. The current plan is to simply migrate each of them to Cassandra with Cassandra running on each DMA just like with MySQL. There are no current plans to try and change the architecture of any systems by breaking out Cassandra from the DMAs onto separate servers and implementing separate Cassandra nodes. We also currently do not plan to add Elastic into the equation either, just basically migrate from current MySQL backend to Cassandra.
I understand that over time, due to differences in Cassandra there is no way to predict database size but isn’t there a way to estimate what the size will be after the initial migration. For example once again, if my current MySQL database is ~200G (on disk) and i stop all trending and run the migration tool what will be the size (on disk) of the Cassandra database once the migration is complete?
Thanks again