We are seeing a warning when running a system checks of the Cassandra Cluster
Category: Repair
Status: Warning
Description: Evaluate if a repair is needed
Suggested Action: The tables 'elementdata_xxxxx_xxx_xxxxx', 'objectreftreeelementtopdown', 'elementdata', 'objectreftransaction', 'analytics_alarmfocus', 'elementdata_xxxxx_xxx_xxxx', 'elementdata_yyyyy_yyy_yyyy', 'elementdata_xxxxx_xxx_xxxx', 'elementdata_yyyyy_yyy_yyyyy', 'analytics_changepoints_v1', 'maskstate', 'elementdata_yyyyy_yyy_yyyy', 'correlationslidingwindow_v2', 'elementdata_xxxxx_xxx_xxxxx', 'elementdata_xxxxx_xxx_xxxx', 'analytics_parameterinfo_v1', 'dveelementinfo', 'spectrum_max_id', 'view_build_status', 'analytics_changepoints_v2', 'datapoints', 'ai_cpalarms', 'elementdata_xxxxx_xxx_xxxx', 'elementlatch', 'elementdata_yyyyy_yyy_yyyy', 'objectreftreeelement', '', 'elementdata_xxxxx_xxx_xxxx', 'analytics_wavestream', 'cmigrationstatus', 'correlationmatchinfo_v2', 'elementdata_xxxxx_xxx_xxxxx' were not repaired within the tombstone removal period. Please increase the gc_grace_seconds or the frequency of the repairs. Repair checks for specific tables can be disabled in the Tables table.
The suggested action is to increase the gc_grace_seconds or the frequency of the repairs.
We are running a repair schedule with an interval of 7 days for the keyspaces in CassandraReaper. May I know what is the recommended repair frequency of the tables?
The tables elementdata_xxxxx_xxx_xxxxx and elementdata_yyyyy_yyy_yyyyy has a gc_grace_seconds of 864000 (10 days) which is higher than the repair frequency of 7 days, these tables are included in the suggested action column, may I know if these tables require an action?
Hi Dennis,
There are no major things that can go wrong if you don't attend to this. Worst case you will see data that should have been deleted already or in other words to avoid 'zombie' data. This could occur when your RF is higher than one (multiple nodes hold a copy) and you run a delete that was not received by all nodes directly or through hint files. This is why you have gc_grace_seconds (time before tombstones can be removed), so if you run a repair you can let the other node know that there was a delete received after the insert. Having a high gc_grace_seconds means you keep the tombstones longer which gives you more grace time to run repairs to avoid this situation, but it can hurt perf and disk space as you might end up with lots of tombstones.
If your gc_grace_seconds is set to 10 days and your repair interval is 7d, then there should be no problem. Have a look into your repair history table to understand when the repairs were triggered on your nodes and if they were successful.