Hi,
I have a DataMiner environment that uses a Cassandra cluster with replication factor 2.
What is the best way to replace a dead node (e.g. in case of a server crash)?
Hey Ive,
Replacing a dead node is a 3-part process.Note that the following procedure is for the standard 2 node failover setup. Custom clusters might need a different process depending on their configuration.
- Part 1: prepare the new node
- Install cassandra
- Place the cassandra files under C:\Program files\cassandra (Currently the best way is to copy them from the other server, DO NO copy the "c:\program files\data" folder)
- Go to c:\program files\cassandra\conf\cassandra.yaml
- Set the following parameters
- cluster_name: (name of the dataminer cluster !must be the same as the other node!)
- data_file_directories: location to save data
- seeds:Ip of this node and other node
- disk_optimization_strategy: ssd or spinning
- listen_address: Ip of this node
- broadcast_rpc_address: IP of this node
- Save and close the file
- Open an elevated powershell prompt
- Navigate to C:\program files\cassandra\bin
- Execute $env:CASSANDRA_HOME = ‘C:\progra~1\Cassandra\’
- Execute $env:JAVA_HOME = ‘C:\progra~1\Cassandra\Java\’
- Execute cassandra.ps1 –install
- Open port 9042 and 7000 on the firewall
- In the registry under "Computer\HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Apache Software Foundation\Procrun 2.0\cassandra\Parameters\Java" add
"-Dcassandra.replace_address_first_boot=<dead_node_ip>
" to the Options-key - In cassandra.yaml add or set "auto_bootstrap:true"
- Install cassandra
- Part 2: Updating the old node
- In the cassandra.yaml under "-seeds:" replace the ip of the dead node with the ip of the new one.
- Restart the node
- Part 3: Start the new node
- The new node should automatically start bootstrapping the data: this can be followed with nodetool netstats
- If the node was down longer than the
max_hint_window_in_ms
. then you must also run a nodetool repair
Hope this helps, if there any questions feel free to leave a comment.
Then it will mainly depend if the dead node was a seed node or not.
If the dead node was a seed node, then you will have to execute “part 2” on all the other nodes.
If the dead node was a standard data node, then you can skip part 2.
It is also a good practice to check if other Yaml settings were changed besides the ones mentioned.
Thanks Brent,
What differences are there when I want to apply this in a Cassandra cluster containing 3+ nodes (not linked to failover).