Hello dojo,
Looking for some more info on the default value for gc_grace_seconds (on the data and elementdata tables, if different) in local Cassandra DBs and any trade-off when changing this value (what's recommended to handle tombstones efficiently and/or to avoid the need for frequent compactions)
I understand this can also vary with the specific DM version - depending on what is configured before upgrading the DMS: https://community.dataminer.services/question/upgrade-behavior-for-gc_grace_seconds/
Thanks
Hi Alberto,
The gc_grace_seconds value is designed to "exclude" a specific tombstoned (=deleted) record in Cassandra for the time specified in "gc_grace_seconds" from compaction.
This is mainly done to prevent these records from becoming "desynchronized" in case Cassandra nodes go down.
If a node goes down, it does not receive the instruction to write tombstones, but the rest of the Cassandra cluster, which has written the tombstones to disk, may still be up.
Would we immediately trigger compaction, effectively removing the tombstones from the rest of the cluster, we would see an effect where, when the node that was down comes up again, would still have the records in their non-tombstoned, non-deleted form. At that point, we have effectively created "zombie data".
The defaults as of present are 4 hours for items with a TTL, such as trending and a full day for items that do not expire by TTL, which is most of the other data, such as configuration data or alarms.
The defaults are a result of careful consideration. We considered e.g. what downtimes we encounter and what data benefits the most from being fully consistent and compared that to the amount of overhead on the disk the setting caused (i.e. the amount of additional disk space this takes).
Usually, we do not recommend deviating from these values.
Deviations are often only done when the system is has a very fast write/delete load, causing a lot of disk space to be taken up in a short time. Then, diminishing the gc_grace_seconds can potentially help as a temporary measure until the write/delete load can be diminished.
Note: for now, DataMiner sets back the default gc_grace_seconds upon each startup.
We are planning to allow more flexibility for that though.
Hi Alberto, on Windows, we toggle a manual "Major compaction" once per week for all the tables. This does not happen on Linux (i.e. Cassandra Cluster), where a "Minor compaction" is sufficient. The gc_grace remains the same on Cassandra single. As such, when a node has been down for a day or more, you may see some of that "zombie data". On Windows, the major compactions are unavoidable. Minor compactions are automatically triggered by Cassandra, depending on the compaction strategy used and a set of timers/triggers defined by Cassandra: https://cassandra.apache.org/doc/latest/cassandra/managing/operating/compaction/overview.html#types-of-compaction
As such, it is not possible to have strict control over the amount of compactions, save for the Major compactions that we schedule.
Thanks for your feedback, Lauren – we'll review with our squad – this was really helpful
Thanks for the prompt feedback, Laurens
Are there any different considerations to be done for the scenario where each DMA has its own local Cassandra DB?
Asking in the context of agents with and without failover license: our goal would be to minimize the need for Cassandra compactions.