Home > database >  I don't understand the explanation in elasticsearch scroll search
I don't understand the explanation in elasticsearch scroll search

Time:03-25

enter image description here

"Keeping the initial search context alive has a high cost for actively updated indices."

Does the high cost in the sentence above refer to memory usage?

So, why is the memory usage so high?

In order to queue update requests of the index while remaining active?

Or because you're caching an active index snapshot in memory?

CodePudding user response:

From the official documentation:

A scroll returns all the documents which matched the search at the time of the initial search request. It ignores any subsequent changes to these documents. [...] The search context is created by the initial request and kept alive by subsequent requests. [...]

Normally, the background merge process optimizes the index by merging together smaller segments to create new, bigger segments. Once the smaller segments are no longer needed they are deleted. This process continues during scrolling, but an open search context prevents the old segments from being deleted since they are still in use.

Keeping older segments alive means that more disk space and file handles are needed. Ensure that you have configured your nodes to have ample free file handles. See File Descriptors.

Additionally, if a segment contains deleted or updated documents then the search context must keep track of whether each document in the segment was live at the time of the initial search request. Ensure that your nodes have sufficient heap space if you have many open scrolls on an index that is subject to ongoing deletes or updates.

The emphasis have been added to the above documentation to highlight why it is costly keep to one or many scroll contexts alive during a substantial period of time. Elasticsearch makes its best to keep everything fresh and alive and discard the old data, but a scroll context is basically putting old data on life support and stashing it in a corner for a bit more time, before letting it die when the scroll context is not needed anymore.

That's why more resources (mainly storage, file handles and heap) are needed to keep scroll contexts alive, that's what is referred to by "high cost"

  • Related