Question

Solved889 views6th July 2023elastic cluster

7

Bernard Pichot [SLC] [DevOps Advocate]1.12K 24th March 2023 2 Comments

I’ve a cluster with N Dataminer Agent and M Elastic Nodes.

Is it possible to know on which Elastic nodes the data are stored for a DMA ?

Marieke Goethals [SLC] [DevOps Catalyst] Selected answer as best 6th July 2023

Srikanth Mandava [SLC] [DevOps Advocate] commented 6th April 2023

Bernard, I think it’s possible to know the shard that contains data. Although it’s a very complex algorithm, and there is elasticsearch API that helps you identify shards that are related to indices.

https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-shards.html

But the key question is why? Why do you want to know nodes where the data exists?

Bernard Pichot [SLC] [DevOps Advocate] commented 6th April 2023

Hi Srikanth,
thanks a lot for your answer.
I have two uses cases :
1) A have failure of several nodes it can be useful to know which data you loose. is it only a replicat that’ll be automatically recover or not
2) test failure/recovery of the cluster, you need to know which nodes you should shutdown for your tests

1 Answer

Bernard, I think it’s possible to know the shard that contains data. Although it’s a very complex algorithm, and there is elasticsearch API that helps you identify shards that are related to indices.

https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-shards.html

But the key question is why? Why do you want to know nodes where the data exists?
Hi Srikanth,
thanks a lot for your answer.
I have two uses cases :
1) A have failure of several nodes it can be useful to know which data you loose. is it only a replicat that’ll be automatically recover or not
2) test failure/recovery of the cluster, you need to know which nodes you should shutdown for your tests

score 0 · Answer 1 · 2023-04-26T07:28:26+00:00

Hi Bernard,

to look at the replication health of the Elastic cluster i use 2 different things
The Elastic ElasticSearch Cluster Monitor driver in combination with Elasticvue.

However the idea and concept of replication factor, both for cassandra and elastic is that you can have X nodes down equal to your replication factor, no matter which ones.

It’s not the intention to really search for which data is where, you can do it however but this is a very time consuming process

So your test should be based on the replication factor and choose “random” servers to go down, equal or less to your replication factor.

If you want to test what happens when more go down, you can do so as well but then another mechanism goes into play. We will buffer up to 60 GB for elastic to “offload files” and will push that data once the cluster comes available again, which is explained here:

https://docs.dataminer.services/user-guide/Advanced_Functionality/Databases/Elasticsearch_database/Configuring_multiple_Elasticsearch_clusters.htm