I'm using elasticsearch index as a cache table. My document structure is the following:
{
"mappings": {
"dynamic": False,
"properties": {
"query_str": {"type": "text"},
"search_results": {
"type": "object",
"enabled": false
},
"query_embedding": {
"type": "dense_vector",
"dims": 768,
},
}
}
The cache search is performed via embedding vector similarity. So if the embedding of the new query is close enough to a cached one, it is considered as a cache hit, and search_results
field is returned to the user.
The problem is that I need to update cached results about once an hour. I wish my service won't lose the ability to use cache efficiently while updating procedure, so I'm not sure which one of solutions is the best:
- Sequentially update documents one-by-one, so the index won't be destroyed. The drawback of this solution I afraid is the fact, that every update causes index rebuilding, so the cache requests will become slow
- Create entirely new index with new results and then somehow swap current cache index with the new one. The drawbacks I see are a) I've found no elegant way to swap indexes b) Users will get their cached resuts lately than in solution (1)
CodePudding user response:
I would go with #2 as everytime you update a document the cache is flushed.
There is an elegant way to swap indices:
You have an alias that points to your current index, you fill a new index with the fresh records, and then you point this alias to the new index.
Something like this:
- Current index name is items-2022-11-26-001
- Create alias items pointing to items-2022-11-26-001
POST _aliases
{
"actions": [
{
"add": {
"index": "items-2022-11-26-001",
"alias": "items"
}
}
]
}
- Create new index with fresh data items-2022-11-26-002
- When it finishes, now point the items alias to items-2022-11-26-002
POST _aliases
{
"actions": [
{
"remove": {
"index": "items-2022-11-26-001",
"alias": "items"
}
},
{
"add": {
"index": "items-2022-11-26-002",
"alias": "items"
}
}
]
}
- Delete items-2022-11-26-001
You run all your queries against "items" alias that will act as an index.
References:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html