Question

Solved1.62K views9th October 2023Cassandra Elastic

2

Carolina Costa [SLC] [DevOps Advocate]856 9th October 2023 0 Comments

Hello Dojo,
On a setup with Cassandra Cluster and Elastic Cluster (4 nodes each), when a single cassandra node is down, it will cause DataMiner to lose read and write to Cassandra (with the current consistency level configured).

I am not expecting to have any impact when retrieving alarms or trend data when one cassandra node is down, because that data should be stored in elastic.

But can anyone please confirm how this ties in with DataMiner raising new alarms and retrieving trend graphs etc. at the time the node is down? Will it have any impact?

Thanks in advance for your help!

Carolina Costa [SLC] [DevOps Advocate] Selected answer as best 9th October 2023

1 Answer

score 1 · Answer 1 · 2023-10-09T12:42:24+00:00

Hi Carolina,

Assuming you have the default consistencylevel “Quorum” applied in the db.xml file, the rule states the following:
The consistency level QUORUM means it needs acknowledgment from 51% or a majority of replica nodes across all datacenters.

Now in case you are running the default replication factor=2 on the defined keyspaces, it means that only 2 nodes will contain the data. If one would go down, it will never reach a majority of above 50% that will respond to new read or new write requests towards your cassandra cluster.

So in this case, you will need to increase your replication factor to minimum of 3 on all DataMiner keyspaces when combining this with a read / write consistencylevel = Quorum.

I also don’t recommend to use a consistencylevel which would offer lower consistency than Quorum as i’ve already noticed some undesired behavior during element restarts / DataMiner SW restarts.

So increasing your replication factor to 3 on all DataMiner keyspaces is what i would recommend in this case.

Hope this helps!

Hello Jeroen, thanks for your answer!
I noticed that I didn’t specify this in my question, but in our case we have 2 datacenters (2 nodes) each and we are using LocalQuorum (which means that It needs acknowledgement from 51% of the nodes within the same datacenter). In this case, when one of the nodes goes down, it will will cause DataMiner to lose read and write to Cassandra cluster.
Now my question is that if this will affect the retrieval of new alarms and trend graphs?
It will definitely affect the retrieval of new alarms and trend graphs. The fact that you need to be georedundant means that you typically need to account for 2 main scenario’s in where its expected that DataMiner will remain fully functional:
1. when a single node within a DataCenter would go down.
2. when a complete DataCenter would be lost.

With those scenario’s in mind, you can only achieve this at cassandra level in case you have a minimum of 3 nodes in each DataCenter with RF = 3 and operate consistencylevel LocalQuorum.

Those settings applied, it means that each node, within your (in total) 6 node cassandra cluster, will hold ALL the data.

Only from that point onwards will both premises (loss of a complete datacenter + loss of a node within a datacenter) be met.

Cassandra node down impact on new alarms and real time trending data

1 Answer