Question

Solved1.25K views11th July 2023database MySQL

2

Jeff Douglass 860 22nd July 2022 1 Comment

I am reaching out to the DataMiner community to try and get an idea of what kind of database and table sizes users are seeing out there and any performance related feedback associated with different sizes that can be provided. Trying to see what might be considered as large dbase/table sizes and how other’s systems are handling these different sizes. If we can get a large sample of different sized systems and how they are behaving this may assist the entire community in preparing for, and dealing, with any dbase size related potential issues. As an example, we have a particular protocol/element in all our systems that have a lot of avg trending active which results in these element’s avg trend tables reaching sizes of 25 to 35G in file size and in some systems these have reached sizes of >100G. On some of these systems the full Backup is now taking over 5 hours to complete and I can only guess how long the Optimize process is taking and what kind of issues this is causing. All our systems are using enterprise grade SSDs but we are running DM and MySQL on a single Xeon Gold CPU DMA with 64G of ram. What is considered to be a large table? What are other users seeing?

Thanks in advance and I look forward to any and all responses.

Marieke Goethals [SLC] [DevOps Catalyst] Selected answer as best 11th July 2023

Alberto De Luca [DevOps Enabler] commented 23rd July 2022

Tuning in to see what comes from the community, but if you have demanding trend requirements, moving to Cassandra is a no-brainer, check if you SSD are up to specs and migrate – DataMiner still supports systems on MySQL, but handling bigger amount of data would normally be way faster in NoSQL – that’s where Cassandra kicked in – and yes, you’ll need to consider your disk performance too (if your architecture uses local DB) or the performance of the separate DB (e.g. with Cassandra cluster). Your TAM will surely be able to evaluate and recommend the best course of action for your system: you may want to consider the introduction of ElasticSearch too

1 Answer

Tuning in to see what comes from the community, but if you have demanding trend requirements, moving to Cassandra is a no-brainer, check if you SSD are up to specs and migrate – DataMiner still supports systems on MySQL, but handling bigger amount of data would normally be way faster in NoSQL – that’s where Cassandra kicked in – and yes, you’ll need to consider your disk performance too (if your architecture uses local DB) or the performance of the separate DB (e.g. with Cassandra cluster). Your TAM will surely be able to evaluate and recommend the best course of action for your system: you may want to consider the introduction of ElasticSearch too

score 2 · Answer 1 · 2022-11-25T11:02:45+00:00

Pitching in on the feedback from Alberto

For Cassandra it indeed highly depends on the HW infrastructure how “big” you can get

MySQL

Unfortunately for MySQL the main limit is MySQL itself, we identified the bottleneck are actually the amount of data you need to DELETE, which in turn also causes a long optimize functionality. Altough with fast enough disk speed the optimize should not have that much impact.

The delete queries are actually loaded in the cache and if you delete more and faster data than it can handle, MySQL will throttle, lose performance and eventually we had one case with 100 GB of trend data that it simply couldn’t delete anymore

-> suggested size for MySQL is 2-4 GB/element based on your HW

Cassandra

This is an entire different story since this is linked with you HW infrastructure

To give some KPIs, we have systems running up to 300 GB of trend data (all elements together) which are running fine. However these are SSDs and non failover.

The problem with failover is that it needs to keep the data in sync and which is a very heavy CPU & memory consumption. We also have a weekly compaction to clean up all tombstones, this is less impacting but still causes a significant load.

The thing you need to pay attention to: the moment prunsrv.exe (Cassandra process) starts to use up all physical memory it will also in turn push our core processes such as SLNet & SLElement onto the disk.

If at that point, you don’t have SSDs that can cope with the page file toggling, you can encounter noticeable performance issues up to even RTEs/crashes in the software

-> suggested size for Cassandra is max 30 GB/table when running on a 64 GB server, this should give some room for the maintenance actions

More RAM and very fast disks will allow you to increase this number but testing is advice (watch out for performance issues during compaction & repair)

Running Cassandra on a separate node you should be able to go to 300 GB on a single node

Cassandra Cluster + Elastic cluster

For bigger systems where we talk about significant amount of alarm & trend data the Cassandra cluster is the way to go, you can choose the amount of nodes and we run now Cassandra 4.0 which contains from Cassandra side huge improvements for time series data (such as our trending) BUT it needs to run on Linux. We have also noticed that Linux towards tombstone cleanup is way better than Windows.

->i don’t have any figures from the field yet but the scalability size is really promising here + the big advantage is that you can use the nodes for ALL DMAs, which means we finally could step away from our 1 DB for 1 DMA setup

Survey of MySQL and Cassandra database and table sizes on different production systems

1 Answer