Currently, we are setting the value of _id when saving documents in the index. However, by doing that, we avoid Elasticsearch from computing the _id on its own, and therefore, forcing documents to be stored in a particular shard. In effect, there is a possibility where some shards could potentially be disproportionally larger than others, since Elasticsearch places the documents on the corresponding shard based on the _id of the document.

Is there a way to balance the shards while retaining the setting of _id of the document?

CodePudding user response：

Tldr;

Create a custom routing on an evenly distributed value.

ie: The ingestion time, if you are continuously indexing data.

CodePudding user response：

As already mentioned you need a custom routing for that. How you can do this with Spring Data Elasticsearch is documented in the reference docs.

Keep in mind that when using a custom routing to store an entity, you must provide the same routing value when doing a get(id) or delete(id) that was used when storing the document.

read the elasticsearch documentation how the routing is calculated by default, I probably would not try to implement a custom shard distribution method, but that's my personal opinion.