Home > Enterprise >  Moving older versions of documents to an history index when saving new ones (elasticsearch)
Moving older versions of documents to an history index when saving new ones (elasticsearch)

Time:07-26

I want to know if there is a built-in solution for something i need in Elasticsearch. I want that everytime that my document is being replaced by a newer version of itself (with the same ID), the older version will not be deleted, but moved to an history index. In that history index, I dont want replacments of older versions, but accumulation of them. Do you know if there is a built-in solution for this, or will I need to program it myself in my API? Thank you.

CodePudding user response:

As there is no in built method for your use-case, you need to do it yourself in your application, I don't think Elasticsearch is best suited for creating the history of a document as as soon as you update the document in the history_index you will loose its previous history and if I understand correctly you want to have the complete history of a document.

I guess best is to use any RDBMS or NoSQL where you create a new history entry of a document (document_id of Elasticsearch and its version number will help you to construct the complete history of your Elasticsearch document).

Above DB you can update as soon as you get a update on Elasticsearch document .

CodePudding user response:

There does not appear to be any built-in functionality for this. The easiest approach might be to copy the old version to the history index with the _reindex API, then write the new version:

POST /_reindex
{
  "source": {
    "index": "your_index",
    "query": {
      "ids": {
        "values": ["<id>"]
      }
    }
  },
  "dest": {
    "index": "your_history_index"
  },
  "script": {
    "source": "ctx.remove(\"_id\")"
  }
}

PUT /your_index/_doc/<id>
{
   ...
}

Note the script ctx.remove("_id") done as part of the _reindex operation, which ensures Elasticsearch will generate a new ID for the document instead of reusing the existing ID. This way, your_history_index will have one copy for each version of the document. Without this script, _reindex would preserve the ID and overwrite older copies.

I assume that the documents contain an identifier that can be used to search your_history_index for all versions of a document, even though _id is reset.

  • Related