In the write tuning section, Elastic recommends to Increase the Refresh Interval
We're doing document ingestions where during ingestion we may do reads, essentially like,
GET /my-index/_doc/mydocumentid
that is, a read of the document by its _id, as opposed to a search. Some descriptions suggest that the document id is just added to the Lucene index like other attributes. Does this mean that the read by id would still reset the refresh_interval
and force a re-index
instead of allowing it to wait for the full refresh_interval
?
CodePudding user response:
This is actually a tricky one:
You are correct that a GET on an _id
works right away (unlike a multi-document operation like a search, which need to wait for an explicit ?refresh
from you or the refresh_interval
). But the underlying implementation changed twice:
- Initially the GET on an
_id
read the data right from the translog, so it didn't need a refresh / the creation of a segment. - The code was complex and so we changed it in 5.0 that it would be read from a segment, but a GET on an
_id
would automatically trigger the_refresh
. So it looked the same on the outside and the code was simpler. - But for use-cases that did a lot of GETs on
_id
this was expensive, since it creates lots of tiny shards. So we changed it back in 7.6 to read again from the translog.
So if you are using a current version, it doesn't trigger a _refresh
.
CodePudding user response:
a get on the _id
is not a search, so no