I need an index, which continuously gets data loaded into Elasticsearch (7.15) via Logstash, the problem is that over time the index will be full and due performance reasons and sheer size it will be preferable to split the index into smaller ones.
As far as I understand rollover and index lifecycle management are the concepts I need to understand in order to fulfill the requirements.
And I have some question in regards to that
When they talk about index alias and datastream. I haven't been able to find anything about what the difference is exactly. They seems to both cover the case of spanding across multiple smaller indexes. So could anyone elaborate what the difference is
As far I understand I need to create a policy, and a index template, and create a datastream and then upload data. I tried to make a simple policy where it should rollover whenever there are more than 3 documents, but even if do so it create an index but never rolls over after the number of documents have exceeded. If I use a max_age it seems to work
The things I do are following:
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_docs": 2
},
"set_priority": {
"priority": 100
}
}
},
"cold": {
"min_age": "30s",
"actions": {
"set_priority": {
"priority": 0
}
}
}
}
}
}
PUT _component_template/my-mappings
{
"template": {
"mappings": {
"properties": {
"@timestamp": {
"type": "date",
"format": "date_optional_time||epoch_millis"
},
"message": {
"type": "wildcard"
}
}
}
},
"_meta": {
"description": "Mappings for @timestamp and message fields",
"my-custom-meta-field": "More arbitrary metadata"
}
}
# Creates a component template for index settings
PUT _component_template/my-settings
{
"template": {
"settings": {
"index.lifecycle.name": "my-lifecycle-policy"
}
},
"_meta": {
"description": "Settings for ILM",
"my-custom-meta-field": "More arbitrary metadata"
}
}
PUT _index_template/my-index-template
{
"index_patterns": ["my-data-stream*"],
"data_stream": {
"hidden": false
},
"composed_of": [ "my-mappings", "my-settings" ],
"priority": 500,
"_meta": {
"description": "Template for my time series data",
"my-custom-meta-field": "More arbitrary metadata"
}
}
PUT my-data-stream/_bulk
{ "create":{ } }
{ "@timestamp": "2099-05-06T16:21:15.000Z", "message": "192.0.2.42 - - [06/May/2099:16:21:15 0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736" }
{ "create":{ } }
{ "@timestamp": "2099-05-07T16:25:42.000Z", "message": "192.0.2.255 - - [06/May/2099:16:25:42 0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" }
{ "create":{ } }
{ "@timestamp": "2099-05-08T16:25:42.000Z", "message": "192.0.2.255 - - [06/May/2099:16:25:42 0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" }
{ "create":{ } }
{ "@timestamp": "2099-05-09T16:25:42.000Z", "message": "192.0.2.255 - - [06/May/2099:16:25:42 0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" }
{ "create":{ } }
{ "@timestamp": "2099-05-10T16:25:42.000Z", "message": "192.0.2.255 - - [06/May/2099:16:25:42 0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" }
PUT my-data-stream/_bulk
{ "create":{ } }
{ "@timestamp": "2099-06-11T16:25:42.000Z", "message": "192.0.2.255 - - [06/May/2099:16:25:42 0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" }
{ "create":{ } }
{ "@timestamp": "2099-06-12T16:25:42.000Z", "message": "192.0.2.255 - - [06/May/2099:16:25:42 0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" }
{ "create":{ } }
{ "@timestamp": "2099-06-13T16:25:42.000Z", "message": "192.0.2.255 - - [06/May/2099:16:25:42 0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" }
{ "create":{ } }
{ "@timestamp": "2099-06-14T16:25:42.000Z", "message": "192.0.2.255 - - [06/May/2099:16:25:42 0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" }
{ "create":{ } }
{ "@timestamp": "2099-06-15T16:25:42.000Z", "message": "192.0.2.255 - - [06/May/2099:16:25:42 0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" }
{ "create":{ } }
{ "@timestamp": "2099-06-16T16:25:42.000Z", "message": "192.0.2.255 - - [06/May/2099:16:25:42 0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" }
- I would like to have a naming scheme such that they roll over each quarter It seems like the naming scheme are by a sequence number is it possible to specify each quarter
Thanks in advance
CodePudding user response:
- an alias is a reference to one or more indices and is an underlying concept in Elasticsearch. a datastream uses aliases, and can be looked at as a collection of concepts like aliases, data tiering etc to make things easier to use via automation
- ILM isn't really designed to work with such small thresholds, so it's not surprising it doesn't work. ie by default, ILM will only check for actions every 10 minutes
- time based rollovers are based off the time that the underlying index was created from the policy. so a "quarterly" rollover relative to the calendar isn't possible