Home > front end >  Fuzzy Matching in Elasticsearch gives different results in two different versions
Fuzzy Matching in Elasticsearch gives different results in two different versions

Time:12-10

I have a mapping in elasticsearch with a field analyzer having tokenizer:

"tokenizer": {
    "3gram_tokenizer": {
      "type": "nGram",
      "min_gram": "3",
      "max_gram": "3",
      "token_chars": [
        "letter",
        "digit"
      ]
    }
  }

now I am trying to search a name = "avinash" in Elasticsearch with query = "acinash"

Es Query formed is:

{
  "size": 5,
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "acinash",
            "fields": [
              "name"
            ],
            "type": "best_fields",
            "operator": "AND",
            "slop": 0,
            "fuzziness": "1",
            "prefix_length": 0,
            "max_expansions": 50,
            "zero_terms_query": "NONE",
            "auto_generate_synonyms_phrase_query": false,
            "fuzzy_transpositions": false,
            "boost": 1.0
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1.0
    }
  }
}

But in ES version 6.8 I am getting the desired result(because of fuzziness) i.e "avinash" from quering "acinash", but in ES version 7.1 I am not getting the result.

Same goes when tried to search "avinash" using "avinaah" in 6.8 i am getting results but in 7.1 i am not getting results

What ES does is it will convert it into tokens :[aci, cin, ina, nas, ash] which ideally should match with tokenised inverted index in ES with tokens : [avi, vin, ina, nas, ash].

But why is it not matching in 7.1?

CodePudding user response:

It's not related to ES version.

Update max_expansions to more than 50.

max_expansions : Maximum number of variations created.

With 3 grams letter & digits as token_chars, ideal max_expansion will be (26 alphabets 10 digits) * 3

  • Related