I was comparing elastic search results with refresh_interval 1s and 30s, when Refresh Policy was set to None by indexing 2000 documents in the same rate. but there was not much difference between their indexing speed.
using version:
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-elasticsearch</artifactId>
<version>4.2.0</version>
</dependency>
config:
@Bean
fun elasticsearchTemplate(): ElasticsearchOperations? {
var restTemplate = ElasticsearchRestTemplate(client())
restTemplate.refreshPolicy = RefreshPolicy.None
return restTemplate
}
and the document and settings:
@Document(indexName = "book")
@Setting(refreshInterval = "1s")
class Book(
@Id
var id: String? = null,
@Field(type = FieldType.Keyword)
var title: String,
@Field(type = FieldType.Keyword)
var author: String,
@Field(type = FieldType.Date)
var date: Date,
)
I looked at elastic search documents for refresh and refresh_interval but I wanted to make sure that in case of setting refresh policy to None, would it really help to increase refresh_interval ?
In heavy indexing scenarios using elastic seaarch would it improve indexing speed to increase refresh interval of an index?
CodePudding user response:
These are different things. refresh
(aka refreshPolicy
) lets you tell ES to start a refresh after indexing and wait for it to complete (wait_for
) or not wait (true
) or just leave cluster do its job (false
, None
, default).
refresh_interval
makes most sense when refresh
is not enabled and it defines how exactly cluster "does its job". Refresh is quite heavy operation so it's recommended to increase it or even disable (set to -1
) for the time of indexing.
If you haven't noticed performance improvement while changing refresh_interval
then you probably haven't maxed out indexing capacity on ES side: bulk size tuning, multiple indexing threads/machines, etc (see Tune for indexing speed)