Home > Back-end >  How does the num_candidates parameter in ElasticSearch's K-nearest neighbor Search API work?
How does the num_candidates parameter in ElasticSearch's K-nearest neighbor Search API work?

Time:09-24

I'm looking for some insight into ElasticSearch's K-Nearest Neighbor Search Results api, specifically the num_candidates parameter.

The API excepts a request like such: https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html

"knn": {
    "field": "image-vector",
    "query_vector": [-5, 9, -12],
    "k": 10,
    "num_candidates": 10000
  },

Is the num_candidates parameter value of 10,0000 saying that it is only going to search through 10,000 records? So if I have an index of 500k records would it not consider them all?

CodePudding user response:

The way it works is described here.

To gather results, the kNN search API finds a num_candidates number of approximate nearest neighbor candidates on each shard. The search computes the similarity of these candidate vectors to the query vector, selecting the k most similar results from each shard. The search then merges the results from each shard to return the global top k nearest neighbors.

Basically, the top K best candidates are selected per shard and then merged together and again the top K best candidates are picked from the merge.

  • Related