Elementdata tables contain all the values stored in saved parameters. This type of parameters are used to store system configurations that may be critical for a given solution. We have encounter some cases when the Elementdata tables are not fully in sync between the Primary and Backup agents.
I'd like to know if there is anything that can be done in a Cassandra failover cluster to give priority to the synchronization of elementdata table over other less critical tables e.g. alarm, trending. By doing so we hope to increase the reliability of this information in a failover system.
Hi Miguel,
When DataMiner is storing data (e.g. elementdata) into DB it will connect with one of your Cassandra nodes and send the data to that node (the coordinator). That coordinator will evaluate for every 'row' that is inserted into a table for which node(s) it is intended. And store it and/or send it to the nodes that hold that data. If it is intended for another node, and that node is not available, it will be stored into hint files (on the coordinator) and when the node comes back up the hint files will be pushed to the appropriate node(s).
If your data is not in sync it means that your nodes were unable to cope with the load, or that one of the nodes was down longer than the configured time to keep hint files for nodes that are down. To make sure all data is in 'sync' between the nodes that are responsible for the same data you can run repair actions (by default scheduled every week through task scheduler).
If you have data that is not in sync even after a repair action it means that Cassandra was unable to repair the data and it should be visible in logging or in the repair_history table in the system_distributed keyspace. There are a couple of reasons why the repair could fail. If it is because Cassandra is unable to cope with the load, you will need to validate if there is memory, cpu or disk problem and take appropriate actions (e.g. improve the disk(s), add an extra Cassandra node or reduce the load by generating less trending or alarms). If there is a table that got corrupted than you will need scrub or repair the data. This article could help you with this.