Home > Blockchain >  Elastic Search Query performance when fetch phase (_source) is disabled
Elastic Search Query performance when fetch phase (_source) is disabled

Time:10-30

we have an Elasticsearch index with 100 million documents (with replicas it's about 400 million). The index contains nested documents as well.

We have a use case where we have to boost the score of the documents using some fields present in the document. For this boosting we are using function score query.

Our response time when we disable the fetch operation is less than 30ms. We use this endpoint to disable the fetch

https://<elastic_endpoint>/elastic_index_name/_search?_source=false

However when we enable the fetch the same response time becomes greater than 2 seconds.

We tried to debug using the profile API, but based on the docs it doesn't look like the profile api returns the time spent during the fetch operation. Hence the output of the profile api shows time in milliseconds which is the same when we run the query with _source disabled.

We tried to use other forms of scoring like rankFeatures and script score query. But we haven't had any luck.

Can someone please share if they have some insights into this issue? Please let me know if I any more details are needed from my end.

CodePudding user response:

Fetching source is always a costly operation in Elasticsearch and it becomes worse when you have nested and bulky documents in your index.

But by default Elasticsearch fetches only 10 documents, so it shouldn't cause huge performance issues unless you are fetching huge number of documents from Elasticsearch.

Also as mentioned by @warkolm in the comment it depends on the underlying storage like SSD or magnetic disk used by Elasticsearch cluster but I believe that should also not cause that much(30 ms to 2 seconds) difference for 10 documents.

Can you tell what is your documents avg size and how many documents you are fetching in your Search query.

CodePudding user response:

We figured out the issue. Disk IOPS was the bottleneck for us. For us the nested documents were very heavy (some were more than 35 MB) in size. Their retrieval was really slow.

What we ended up doing was to disable their storage (nested fields) from the _source field. We didn't really need them to be returned in the response but needed them in the index for some scoring.

Thank you everyone for helping us with this issue. This is the thread on Elastic Discussion Forum which helped us resolve this issue: https://discuss.elastic.co/t/elastic-search-query-performance-when-source-is-disabled/317348/11

  • Related