Home > Software design >  What is the best way to update cache in elasticsearch
What is the best way to update cache in elasticsearch

Time:12-02

I'm using elasticsearch index as a cache table. My document structure is the following:

{
        "mappings": {
            "dynamic": False,
            "properties": {
                "query_str": {"type": "text"},
                "search_results": {
                    "type": "object", 
                    "enabled": false
                },
                "query_embedding": {
                    "type": "dense_vector",
                    "dims": 768,
                },
               
        }
    }

The cache search is performed via embedding vector similarity. So if the embedding of the new query is close enough to a cached one, it is considered as a cache hit, and search_results field is returned to the user.

The problem is that I need to update cached results about once an hour. I wish my service won't lose the ability to use cache efficiently while updating procedure, so I'm not sure which one of solutions is the best:

  1. Sequentially update documents one-by-one, so the index won't be destroyed. The drawback of this solution I afraid is the fact, that every update causes index rebuilding, so the cache requests will become slow
  2. Create entirely new index with new results and then somehow swap current cache index with the new one. The drawbacks I see are a) I've found no elegant way to swap indexes b) Users will get their cached resuts lately than in solution (1)

CodePudding user response:

I would go with #2 as everytime you update a document the cache is flushed.

There is an elegant way to swap indices:

You have an alias that points to your current index, you fill a new index with the fresh records, and then you point this alias to the new index.

Something like this:

  1. Current index name is items-2022-11-26-001
  2. Create alias items pointing to items-2022-11-26-001
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "items-2022-11-26-001",
        "alias": "items"
      }
    }
  ]
}
  1. Create new index with fresh data items-2022-11-26-002
  2. When it finishes, now point the items alias to items-2022-11-26-002
POST _aliases
{
  "actions": [
    {
      "remove": {
        "index": "items-2022-11-26-001",
        "alias": "items"
      }
    },
    {
      "add": {
        "index": "items-2022-11-26-002",
        "alias": "items"
      }
    }
  ]
}
  1. Delete items-2022-11-26-001

You run all your queries against "items" alias that will act as an index.

References:

https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html

  • Related