Hi all,
We've recently had some issues regarding the Cassandra\data\hints folder getting overrun with files and depleting the disk. For now, we've been emptying that folder and restarting Cassandra's service but we were wondering if there is a way to prevent Cassandra from offloading this much data.
Thanks in advance.
Hi Tiago, Cassandra generates hint files when it loses connection between agents in a cluster or a write requests times out, these hint files are then used to quickly bring the nodes back up to speed with the requests they missed (so a full repair would not be needed). By default a node will keep 3h worth of hint files, so depending on the amount of data going through a node 3h could generate a lot of files. To reduce the amount of hint files you can do the following (in order of preferability)
- Check why the Cassandra keeps on losing connection and fix that issue
- Reduce the “max_hint_window_in_ms” in the Cassandra.yaml to something shorter.
- Disable the hinted_handoff completely by altering the “hinted_handoff_enabled” Do note that this could have impact on the data correctness until the repair cycle starts.
On a sidenote if the folder is full, you can run “nodetool truncatehints” to try and clear the folder without the need of a restart. More information on the Hint files inner working can be found here
For completion, those that are wondering what the system.hints thable is for:
The system.hints table is a remnant of pre-3.0 hint-mechanism and only there for backwards compatibility reasons. This table is not used in version 3+ . The devs replaced it in favour of a file based system to avoid a cassandra antipattern (more info https://www.datastax.com/blog/whats-coming-cassandra-30-improved-hint-storage-and-delivery) there is no relation between the system.hints table and the hint files on the system.