With Cassandra and OpenSearch clusters split across two sites, is there a way of having the DMA's talk to the nodes on the same site as them for the main connection, and the other site as a secondary connection?
Is it just the order in the xml tag, or do they cycle through them etc?
Hi Philip,
In High-level and configuration wise what I have seen so far:
Cassandra, each site will be an datacenter and all nodes should share same "cassandra.yaml" , in each agent db.xml you can then configure "local" Cassandra IP Addresses.
OpenSearch, it´s possible that the online DMA always tries to read first from the 'local' Elastic cluster, by configuring the priorityOrder from the DBConfiguration.xml file, this is DMA specific. The lowest priority order will always be picked first. So, just set your local DMA´s with the lowest priority order.
For Cassandra, you should configure the datacenters in Cassandra as in reality. Spreading your nodes over multiple datacenters, but having them configured as one datacenter in Cassandra is not the best practice. Only once you have the two DC in Cassandra, we can make use of DCAwareRoundRobin to execute queries on the DB and keep the queries to one datacenter. In addition, DCAwareRoundRobin is scheduled to be released in DM 10.3.10 ATM.
Thanks Michiel,
The cluster is being configured with the DC they’re in set on each node.
Is the default round robin policy used currently if DCAwareRoundRobin isn’t available until 10.3.10?
Will the UsedHostsPerRemoteDc be used in the implementation, to allow connection to the remote DC if none of the nodes in the local DC are available?
Hi Philip,
The UsedHostsPerRemoteDC is deprecated in the Cassandra driver and won’t be used, Mathijs has linked the blog post of the Cassandra community as to their reasoning on this.
Currently the DCAwareRoundRobinPolicy is used, but it’s not possible to specify the DB nodes to contact first in the DB.xml as this information is synced in the cluster. Tentatively, this will be possible with the 10.3.11 release.
As for OpenSearch, by default each node has the coordinating role assigned to it, allowing it to handle client requests. For a single cluster the DataMiner agent will connect to the address listed in its DB.xml configuration. This address is used as an initial contact point but from that point onwards the queries can be sent to any node with the coordinating role. Currently it’s not supported to limit this traffic to a specific node with a setting in DataMiner.
Hi Sergio,
They’re single Cassandra and OpenSearch clusters across two datacentres that we’ll want to use swarming on.
For resilience on the clusters we’d want the local datacentre nodes for each to be used as default, then the other datacentre if the local nodes go down.
The PriorityOrder on the ElasticSearch page is for multiple clusters, rather than a single cluster spanning two DCs, with the general pages for both not suggesting any option on preferred nodes in either cluster.