Question

Solved1.29K views19th July 2023Cassandra

1

Ive Herreman [SLC] [DevOps Enabler]13.88K 23rd February 2021 0 Comments

Hi,

I have a DataMiner environment that uses a Cassandra cluster with replication factor 2.
What is the best way to replace a dead node (e.g. in case of a server crash)?

Marieke Goethals [SLC] [DevOps Catalyst] Selected answer as best 19th July 2023

1 Answer

score 0 · Answer 1 · 2021-02-24T08:00:54+00:00

Hey Ive,

Replacing a dead node is a 3-part process.Note that the following procedure is for the standard 2 node failover setup. Custom clusters might need a different process depending on their configuration.

Part 1: prepare the new node
- Install cassandra
  1. Place the cassandra files under C:\Program files\cassandra (Currently the best way is to copy them from the other server, DO NO copy the “c:\program files\data” folder)
  2. Go to c:\program files\cassandra\conf\cassandra.yaml
  3. Set the following parameters
  - cluster_name: (name of the dataminer cluster !must be the same as the other node!)
  - data_file_directories: location to save data
  - seeds:Ip of this node and other node
  - disk_optimization_strategy: ssd or spinning
  - listen_address: Ip of this node
  - broadcast_rpc_address: IP of this node
  1. Save and close the file
  2. Open an elevated powershell prompt
  3. Navigate to C:\program files\cassandra\bin
  4. Execute $env:CASSANDRA_HOME = ‘C:\progra~1\Cassandra\’
  5. Execute $env:JAVA_HOME = ‘C:\progra~1\Cassandra\Java\’
  6. Execute cassandra.ps1 –install
  7. Open port 9042 and 7000 on the firewall
- In the registry under “Computer\HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Apache Software Foundation\Procrun 2.0\cassandra\Parameters\Java” add "-Dcassandra.replace_address_first_boot=<dead_node_ip>” to the Options-key
- In cassandra.yaml add or set “auto_bootstrap:true”
Part 2: Updating the old node
- In the cassandra.yaml under “-seeds:” replace the ip of the dead node with the ip of the new one.
- Restart the node
Part 3: Start the new node
- The new node should automatically start bootstrapping the data: this can be followed with nodetool netstats
- If the node was down longer than the max_hint_window_in_ms. then you must also run a nodetool repair

Hope this helps, if there any questions feel free to leave a comment.

Thanks Brent,

What differences are there when I want to apply this in a Cassandra cluster containing 3+ nodes (not linked to failover).
Then it will mainly depend if the dead node was a seed node or not.
If the dead node was a seed node, then you will have to execute “part 2” on all the other nodes.

If the dead node was a standard data node, then you can skip part 2.

It is also a good practice to check if other Yaml settings were changed besides the ones mentioned.

Replace a dead node in a Cassandra cluster

1 Answer