Question

Solved3.31K views19th October 2023Cassandra opensearch

7

Philip Argent [DevOps Enabler]589 25th August 2023 0 Comments

With Cassandra and OpenSearch clusters split across two sites, is there a way of having the DMA's talk to the nodes on the same site as them for the main connection, and the other site as a secondary connection?

Is it just the order in the xml tag, or do they cycle through them etc?

Marieke Goethals [SLC] [DevOps Catalyst] Selected answer as best 19th October 2023

2 Answers

You are viewing 1 out of 2 answers, click here to view all answers.

score 1 · Answer 1 · 2023-09-05T11:54:46+00:00

1

Matthijs Favorel [SLC] [DevOps Advocate]2.02K Posted 5th September 2023 4 Comments

Hi Philip,

If I remember correctly, the Cassandra implementation in DM is using the DCAwareRoundRobinPolicy, it will basically detect the closest DC automatically and use all nodes of that DC in a round robin mechanism to load balance.

Marieke Goethals [SLC] [DevOps Catalyst] Selected answer as best 19th October 2023

Philip Argent commented 5th September 2023

Hi Matthijs,

Thanks, will it automatically use another DC in the scenario that all the local nodes are unavailable?

Matthijs Favorel [SLC] [DevOps Advocate] commented 5th September 2023

As far as I know, it will not automatically failover to the other DC. When all Cassandra nodes in a DC are down, you will need a restart of DM to have DM connect to the other DC. The reasoning behind this, is if all nodes in a DC are down, most likely something is also happening with the DM node in that DC. Also see this great writeup here: https://foundev.medium.com/cassandra-local-quorum-should-stay-local-c174d555cc57

Philip Argent commented 7th September 2023

Between our two main DC’s that Cassandra and OpenSearch will be in there’s a big enough wan link with only 2ms latency, so most of the issues raised on that article wouldn’t count for us.
We have one outlier on and DMA pair that’s not in either of those DC’s, which has a 10ms latency.

Simon Declerck [SLC] [DevOps Advocate] commented 13th September 2023

Hi Philip,

I this scenario it’s the driver middleware to Cassandra which does not allow for an automatic connection to another DC in case all local nodes are unavailable.
The idea being that if all nodes in a DC are unavailable then the DataMiner agents in said DC are likely down as well.

There are upcoming improvements to the behaviour of connecting to Cassandra nodes that will be tentatively released with the 10.3.11.
This will allow you to specify for each DataMiner agent which DB nodes should be tried first. With this it will be possible to have a failover pair spanning 2 datacenters in which each agent of the pair only connects to its local DC.

OpenSearch and Cassandra node selection

2 Answers