With Cassandra and OpenSearch clusters split across two sites, is there a way of having the DMA's talk to the nodes on the same site as them for the main connection, and the other site as a secondary connection?
Is it just the order in the xml tag, or do they cycle through them etc?
Hi Philip,
If I remember correctly, the Cassandra implementation in DM is using the DCAwareRoundRobinPolicy, it will basically detect the closest DC automatically and use all nodes of that DC in a round robin mechanism to load balance.
As far as I know, it will not automatically failover to the other DC. When all Cassandra nodes in a DC are down, you will need a restart of DM to have DM connect to the other DC. The reasoning behind this, is if all nodes in a DC are down, most likely something is also happening with the DM node in that DC. Also see this great writeup here: https://foundev.medium.com/cassandra-local-quorum-should-stay-local-c174d555cc57
Between our two main DC’s that Cassandra and OpenSearch will be in there’s a big enough wan link with only 2ms latency, so most of the issues raised on that article wouldn’t count for us.
We have one outlier on and DMA pair that’s not in either of those DC’s, which has a 10ms latency.
Hi Philip,
I this scenario it’s the driver middleware to Cassandra which does not allow for an automatic connection to another DC in case all local nodes are unavailable.
The idea being that if all nodes in a DC are unavailable then the DataMiner agents in said DC are likely down as well.
There are upcoming improvements to the behaviour of connecting to Cassandra nodes that will be tentatively released with the 10.3.11.
This will allow you to specify for each DataMiner agent which DB nodes should be tried first. With this it will be possible to have a failover pair spanning 2 datacenters in which each agent of the pair only connects to its local DC.
Hi Matthijs,
Thanks, will it automatically use another DC in the scenario that all the local nodes are unavailable?