How to retry Elasticsearch shard allocation, UNASSIGNED Shards-CodePudding

I'm having an issue with some of my elastic search indices in the cluster:

I have 5 regular shards for an example index logs-2021.08, so when I'm running _cat/shards elastic API I'm getting good results (example):

logs-2021.08                        2     r      STARTED    25008173   11.9gb 0.0.0.0  instance-0000000128
logs-2021.08                        2     p      STARTED    25008173   11.8gb 0.0.0.0  instance-0000000119
logs-2021.08                        4     p      STARTED    25012332   11.8gb 0.0.0.0 instance-0000000129
logs-2021.08                        4     r      STARTED    25012332   11.9gb 0.0.0.0  instance-0000000119
logs-2021.08                        1     p      STARTED    25003649   11.8gb 0.0.0.0 instance-0000000121
logs-2021.08                        1     r      STARTED    25003649   11.8gb 0.0.0.0  instance-0000000115
logs-2021.08                        3     p      STARTED    25006085   11.8gb 0.0.0.0 instance-0000000121
logs-2021.08                        3     r      STARTED    25006085   11.8gb 0.0.0.0   instance-0000000135
logs-2021.08                        0     p      STARTED    25007160   11.9gb 0.0.0.0  instance-0000000128
logs-2021.08                        0     r      STARTED    25007160   11.9gb 0.0.0.0  instance-0000000118

The issue is that I'm also getting these in the results of the cat API:

partial-logs-2021.08                2     p      UNASSIGNED                                 
partial-logs-2021.08                4     p      UNASSIGNED                                 
partial-logs-2021.08                1     p      UNASSIGNED                                 
partial-logs-2021.08                3     p      UNASSIGNED                                 
partial-logs-2021.08                0     p      UNASSIGNED

I could not find what the problem is or why I'm having these partial indices, but the cluster seems to be unhealthy with these unassigned shards.

Is there any way to solve these from the root (and not the obvious deleting them)?

CodePudding user response：

Easy

Retry Elasticsearch shard allocation was blocked due to too many subsequent allocation failures.

curl -X POST http://127.0.0.1:9200/_cluster/reroute?retry_failed=true

But Understand the reason behind and Allocation API

Elasticsearch allocation API, cluster will attempt to allocate a shard a maximum of index.allocation.max_retries times in a row (defaults to 5), before giving up and leaving the shard unallocated. This scenario can be caused by trying max 5 times, we can increase this to try again for assignment initialization, but issue may repeat.

curl --silent --request PUT --header 'Content-Type: application/json' 127.0.0.1:9200/my_index_name/_settings?pretty=true --data-ascii '{
  "index": {
    "allocation": {
       "max_retries": 15
    }
  }                         
}'

But this may fail again because of the different reasons, so Identify the cause, with the cluster allocation. Possible issues could be

Watermark issue because of the hard disk space
Indexing errors. This will occur when you have moved your index from one folder to another folder or from one server to another server.
Structural problems such as having an analyzer which refers to a stopwords file that doesn’t exist on all nodes.

Get Unassigned Shards

curl -s "http://127.0.0.1:9200/_cat/shards?v" | awk 'NR==1 {print}; $4 == "UNASSIGNED" {print}'

To understand the reason run the following command

GET /_cluster/allocation/explain

# OR

curl -XGET "location:9200/_cluster/allocation/explain"

# OR

curl http://127.0.0.1:9200/_cluster/state | jq '.routing_table.indices | .[].shards[][] | select(.state=="UNASSIGNED") | {index: .index, shard: .shard, primary: .primary, unassigned_info: .unassigned_info}'

Once the problem has been corrected, allocation can be manually retried by calling the reroute API with the ?retry_failed URI query parameter, which will attempt a single retry round for these shards. Command to reinitiate the allocation API with the following API.

curl -X POST http://127.0.0.1:9200/_cluster/reroute?retry_failed=true