I've a cluster with N Dataminer Agent and M Elastic Nodes.
Is it possible to know on which Elastic nodes the data are stored for a DMA ?
Hi Srikanth,
thanks a lot for your answer.
I have two uses cases :
1) A have failure of several nodes it can be useful to know which data you loose. is it only a replicat that’ll be automatically recover or not
2) test failure/recovery of the cluster, you need to know which nodes you should shutdown for your tests
Hi Bernard,
to look at the replication health of the Elastic cluster i use 2 different things
The Elastic ElasticSearch Cluster Monitor driver in combination with Elasticvue.
However the idea and concept of replication factor, both for cassandra and elastic is that you can have X nodes down equal to your replication factor, no matter which ones.
It's not the intention to really search for which data is where, you can do it however but this is a very time consuming process
So your test should be based on the replication factor and choose "random" servers to go down, equal or less to your replication factor.
If you want to test what happens when more go down, you can do so as well but then another mechanism goes into play. We will buffer up to 60 GB for elastic to "offload files" and will push that data once the cluster comes available again, which is explained here:
https://docs.dataminer.services/user-guide/Advanced_Functionality/Databases/Elasticsearch_database/Configuring_multiple_Elasticsearch_clusters.htm
Bernard, I think it’s possible to know the shard that contains data. Although it’s a very complex algorithm, and there is elasticsearch API that helps you identify shards that are related to indices.
https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-shards.html
But the key question is why? Why do you want to know nodes where the data exists?