Question

Solved833 views17th March 2023

8

Jorge Sienra [SLC] [DevOps Enabler]371 17th March 2023 0 Comments

Any potential pitfalls in this configuration? A single cluster with all DMAs, while each region has its own Cassandra and Elastic clusters. All DMAs are primary agents (no failovers)

Jorge Sienra [SLC] [DevOps Enabler] Selected answer as best 17th March 2023

1 Answer

score 12 · Answer 1 · 2023-03-17T15:29:35+00:00

12

Jeroen Nietvelt [SLC] [DevOps Advocate]1.35K Posted 17th March 2023 4 Comments

Hi Jorge, We’ve been running this exact architecture already for the last 2 years without any pitfalls. A small addendum is that the Cassandra cluster is actually configured as one Cluster including all 6 nodes where 3 nodes are grouped into a Datacenter X and 3 nodes are in Datacenter Y. You can then specify the replication factor at Datacenter level causing both Datacenters to contain the same overall dataset.

you have a thumbs up from me!

Chris Glover [DevOps Advocate] Posted new comment 20th March 2023

Jorge Sienra [SLC] [DevOps Enabler] commented 17th March 2023

great; thank you Jeroen. If I understand correctly, it would not be an option to do the same with a single Cassandra node in each region, because then we would have a two-node Cassandra cluster and that configuration is not recommended?

Jeroen Nietvelt [SLC] [DevOps Advocate] commented 17th March 2023

Correct, in this kind of scenario’s we want to offer redundancy at database level which we do by spinning up 3 nodes and configuring them with a replication factor of minimum 2. This means that a single value is always present on 2 of the 3 available nodes. As such you can survive the loss of a single node without suffering from data loss. You can always spin up extra nodes afterwards in case capacity of the cluster needs to be increased.

This type of redundancy is there to avoid having to take regular backups of your dataset. You basically foresee a number of nodes and then indicate to the cluster how much copies of the data are required. The combination of amount of nodes with amount of copies of the data determines how resilient you are against data loss in case of node failures.

The minimum for a stable production cluster setup including data resilience is a 3 node cluster.

Jorge Sienra [SLC] [DevOps Enabler] commented 17th March 2023

Thanks!

Chris Glover [DevOps Advocate] commented 20th March 2023

Hi Jeroen, any chance you could share a sample configuration file for Cassadnra please? We are looking at building exctly the scenario described.

Thanks

DMS of geographically distributed DMAs, Cassandra, and Elastic clusters

1 Answer