We have been doing some digging through the documentation trying to find recommended latency values for:
2- Cassandra Node - Cassandra Node
3- Cassandra Seeds/Nodes- DMA/DMS
The only recommendation we have found so far is that of the Indexing Engine (50ms). Could someone here indicate if Skyline has any KPIs that could be used as a guide when installing a DataMiner System? This becomes particularly important once we start supporting the centralized Cassandra clusters, since the cluster could be composed of DMAs and nodes hosted in different data centers.
Thank you in advance,
For Cassandra there are two important ways of communication between nodes:
- Internode communications (gossip)
This is needed to provide information between nodes on the Cassandra cluster in a scalable way.
To have good gossip between nodes it is important that you have the same seed list in your config on each node (cassandra.yaml file).
Making every node a seed node is NOT recommended (they recommend three for every DataCenter) => this is hard to achieve with current Cassandra architecture.
A node can be marked down (to prevent a coordinator node to send to this one on every query request).
The threshold to mark a node as down is the phi_convict_threshold (located in the cassandra.yaml file).
The threshold takes into account network performance, workload, and historical conditions.
In simple words: "The node is marked down if the node does not respond within a certain time. This time is the average time * phi_convict_threshold."
- Requests (e.g. Read Requests)
A coordinator will wait on another node to handle the request based on a timeout time in ms (settings available in the cassandra.yaml file e.g. read_request_timeout_in_ms). This is important for a query to succeed or not.
If you want to make sure you are working with the latest data stored you need to use the Quorum consistency level on your queries. Meaning that more than 50% of your nodes that hold your data (based on replication factor) will need to answer. In case of a FO setup we build currently a Cassandra cluster of 2 nodes with a replication factor of two, so if we would use Quorum we would need both nodes to be online (or not marked as down) all the time. That is why we use consistency level 1 at the moment.
So in general you can have any latency between your Cassandra nodes, however based on your configuration it will or will not give problems.