I'm looking for some insight into ElasticSearch's K-Nearest Neighbor Search Results api, specifically the num_candidates
parameter.
The API excepts a request like such: https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html
"knn": {
"field": "image-vector",
"query_vector": [-5, 9, -12],
"k": 10,
"num_candidates": 10000
},
Is the num_candidates
parameter value of 10,0000 saying that it is only going to search through 10,000 records? So if I have an index of 500k records would it not consider them all?
CodePudding user response:
The way it works is described here.
To gather results, the kNN search API finds a num_candidates number of approximate nearest neighbor candidates on each shard. The search computes the similarity of these candidate vectors to the query vector, selecting the k most similar results from each shard. The search then merges the results from each shard to return the global top k nearest neighbors.
Basically, the top K best candidates are selected per shard and then merged together and again the top K best candidates are picked from the merge.