I'm having an issue with some of my elastic search indices in the cluster:
I have 5 regular shards for an example index logs-2021.08
, so when I'm running _cat/shards
elastic API I'm getting good results (example):
logs-2021.08 2 r STARTED 25008173 11.9gb 0.0.0.0 instance-0000000128
logs-2021.08 2 p STARTED 25008173 11.8gb 0.0.0.0 instance-0000000119
logs-2021.08 4 p STARTED 25012332 11.8gb 0.0.0.0 instance-0000000129
logs-2021.08 4 r STARTED 25012332 11.9gb 0.0.0.0 instance-0000000119
logs-2021.08 1 p STARTED 25003649 11.8gb 0.0.0.0 instance-0000000121
logs-2021.08 1 r STARTED 25003649 11.8gb 0.0.0.0 instance-0000000115
logs-2021.08 3 p STARTED 25006085 11.8gb 0.0.0.0 instance-0000000121
logs-2021.08 3 r STARTED 25006085 11.8gb 0.0.0.0 instance-0000000135
logs-2021.08 0 p STARTED 25007160 11.9gb 0.0.0.0 instance-0000000128
logs-2021.08 0 r STARTED 25007160 11.9gb 0.0.0.0 instance-0000000118
The issue is that I'm also getting these in the results of the cat API:
partial-logs-2021.08 2 p UNASSIGNED
partial-logs-2021.08 4 p UNASSIGNED
partial-logs-2021.08 1 p UNASSIGNED
partial-logs-2021.08 3 p UNASSIGNED
partial-logs-2021.08 0 p UNASSIGNED
I could not find what the problem is or why I'm having these partial indices, but the cluster seems to be unhealthy with these unassigned shards.
Is there any way to solve these from the root (and not the obvious deleting them)?
CodePudding user response:
Easy
Retry Elasticsearch shard allocation was blocked due to too many subsequent allocation failures.
curl -X POST http://127.0.0.1:9200/_cluster/reroute?retry_failed=true
But Understand the reason behind and Allocation API
Elasticsearch allocation API, cluster
will attempt to allocate a shard a maximum of index.allocation.max_retries
times in a row (defaults to 5), before giving up and leaving the shard unallocated. This scenario can be caused by trying max 5 times, we can increase this to try again for assignment initialization, but issue may repeat.
curl --silent --request PUT --header 'Content-Type: application/json' 127.0.0.1:9200/my_index_name/_settings?pretty=true --data-ascii '{
"index": {
"allocation": {
"max_retries": 15
}
}
}'
But this may fail again because of the different reasons, so Identify the cause, with the cluster allocation. Possible issues could be
- Watermark issue because of the hard disk space
- Indexing errors. This will occur when you have moved your index from one folder to another folder or from one server to another server.
- Structural problems such as having an analyzer which refers to a stopwords file that doesn’t exist on all nodes.
Get Unassigned Shards
curl -s "http://127.0.0.1:9200/_cat/shards?v" | awk 'NR==1 {print}; $4 == "UNASSIGNED" {print}'
To understand the reason run the following command
GET /_cluster/allocation/explain
# OR
curl -XGET "location:9200/_cluster/allocation/explain"
# OR
curl http://127.0.0.1:9200/_cluster/state | jq '.routing_table.indices | .[].shards[][] | select(.state=="UNASSIGNED") | {index: .index, shard: .shard, primary: .primary, unassigned_info: .unassigned_info}'
Once the problem has been corrected, allocation can be manually retried by calling the reroute API with the ?retry_failed URI query parameter, which will attempt a single retry round for these shards. Command to reinitiate the allocation API with the following API.
curl -X POST http://127.0.0.1:9200/_cluster/reroute?retry_failed=true