Currently, we are setting the value of _id when saving documents in the index. However, by doing that, we avoid Elasticsearch from computing the _id on its own, and therefore, forcing documents to be stored in a particular shard. In effect, there is a possibility where some shards could potentially be disproportionally larger than others, since Elasticsearch places the documents on the corresponding shard based on the _id of the document.
Is there a way to balance the shards while retaining the setting of _id of the document?
CodePudding user response:
Tldr;
Create a custom routing on an evenly distributed value.
ie: The ingestion time, if you are continuously indexing data.
CodePudding user response:
As already mentioned you need a custom routing for that. How you can do this with Spring Data Elasticsearch is documented in the reference docs.
Keep in mind that when using a custom routing to store an entity, you must provide the same routing value when doing a get(id)
or delete(id)
that was used when storing the document.
read the elasticsearch documentation how the routing is calculated by default, I probably would not try to implement a custom shard distribution method, but that's my personal opinion.