During testing I noticed several occasions where Elastic goes in offline mode, although it still seems to be online.
The logs of Elastic also seem to be fine, but I noticed the Cassandra Health log reports the following issue: Failed to retrieve shard information
Any tweaks I can do to avoid this issue?
Marieke Goethals [SLC] [DevOps Catalyst] Selected answer as best 18th July 2023
Hello Peter,
I'm unsure what is actually happening, could you clarify the following:
- Where do you see Elastic is going offline?
- Where do you see it still appears online?
- What are your current settings regarding heap size in regedit: Computer\HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Apache Software Foundation\Procrun 2.0\elasticsearch-service-x64\Parameters\Java\JvmMs & JvmMx
- Are there any operational issues?
- Does SLDBConnection/SLSearch mention any issues around the time of the event?
Kind regards,
Marieke Goethals [SLC] [DevOps Catalyst] Selected answer as best 18th July 2023
Hi Thomas
– In the Alarm console I notice an “All nodes in the Indexing cluster are down.”, causing a second alarm mentioning some storages going in file offload mode. – The first alarm gets fixed after a few minutes, but the one about the offload remains.
– Performing a basic API call works fine and the cluster health seems OK.
– Both are set to 8192.
– Some RTEs apear, probably because calls are stuck.
– SLSearch mentions a rollover and pause for a certain index. Briefly after the connection recovery a rollover started for the same index.
Kind regards