Hi Dojo,
I came up with these 2 architectures for a failover pair, could you please let me know if both are correct or if there are any flaws to any of the architectures ? Thank you 🙂
Hi Arunkrishna,
If you want to be fully redundant (on DataMiner and DB level) you will need at least the following architecture (more info on Supported System Data Storage Architectures - DataMiner Dojo):
This is because for Elastic you need an uneven amount of master nodes (1,3,5,...) to avoid split-brain.
If you want to be fully HW redundant you will need at least 3 physical servers:
- 1st server: Cassandra, DataMiner and Elastic
- 2nd server: Cassandra, DataMiner and Elastic
- 3rd server: Cassandra and Elastic
This way if one server goes down you still have sufficient Cassandra, Elastic or DataMiner nodes available.
Hi Alberto,
For Cassandra:
The cluster can exist out of multiple datacenters and normally all datacenters will hold all data. Every datacenter will have a certain amount of nodes over which the data for every datacenter is divided. For every datacenter you can configure a replication factor which will tell how many nodes will have a copy the data.
This means that the DMA will connect to a Cassandra node in the cluster to perform the requests and depending of what data is needed it will be forwarded to the correct nodes that hold the data. If you have a large delay between some nodes, it could be that you need to wait long time before you get the data (depending on Consitency level of your request and the Replication factor).
Elastic:
Multiple DataCenters is a paid option with Elastic (also see the ‘Failover setups (with geo-redundancy)’ section at https://community.dataminer.services/supported-system-data-storage-architectures/), but DataMiner can support multiple Elastic clusters. Again large delays between the nodes should be avoided to keep the latency under control for all requests as with Cassandra.
If you would want to keep cost/complexity as low as possible and you are going to cover multiple regions with high latency(e.g. US, Europe, Asia, …), I would advise to have your Cassandra/Elastic cluster on one location (to keep the delays between nodes as low as possible to avoid issues). The DMA’s can then be spread on different locations. From the moment you see the connection towards your DB for certain DMA’s is unacceptable, you can add a datacenter for Elastic/Cassandra closer to the location fo those DMA’s.
Hope this answers your question.
That’s a rock solid design- thanks for sharing!
What about a scenario where a separate Cassandra server (or cluster) isn’t an option (too complex to manage or simply not in in budget) and the environment needs to rely on local DBs?
Also when it comes to clustering the Cassandra nodes, are there any latency requirements?
Thinking of a WAN where DMAs of the same cluster can span across different Countries.