I have a large amount of source code which changes frequently on disk. The source code is organized (and probably best managed) in chunks of "projects". I would like to maintain a current index of the source code so that they can be searched. Historical versions of the documents are not required.
To avoid infinitely growing indexes from the delete/add process, I would like to manage the index in chunks (partitions?). The ingestion process would drop the chunk corresponding to a project before re-indexing the project. A brief absence of the data during re-indexing is tolerable.
When execute I query, I need to hit all of the chunks. Management of the indexes is my primary concern -- performance less so.
I can imagine that there could be two ways this might work:
- partition an index. Drop a partition, then rebuild it.
- a meta-index. Each project would be created as an individual index, but some sort of a "meta" construct would allow all of them to be queried in a single operation.
From what I have read, this does not seem to be a candidate for rollover indexes.
There are more than 1000 projects. Specifying a list of projects when the query is executed is not practical.
Is it possible to partition an index so that I can manage (drop and reindex) it in named chunks, while maintaining the ability to query it as a single unified index?
CodePudding user response:
Yes, you can achieve this using aliases.
Let's say you have the "old" version of the project data in index "project-1" and that index also has an alias "project". Then you index the "new" version of the project data in index "project-2". All the queries are done on the alias "project" instead of querying the index directly.
So when you're done reindexing the new version of the data, you can simply switch the alias from "project-1" to "project-2". No interruption of service for your queries.
That's it!
POST _aliases
{
"actions": [
{
"add": {
"index": "project-1",
"alias": "project"
}
},
{
"remove": {
"index": "project-2",
"alias": "project"
}
}
]
}