Home > Back-end >  elasticsearch edge_ngram token filter does not match partial query
elasticsearch edge_ngram token filter does not match partial query

Time:06-24

I have defined a following analyzer in elasticsearch config in java

  "tokenFilterNgram": {
    "type": "edge_ngram",
    "min_gram": 1,
    "max_gram": 20
  }

the analyzer is defined as follows:

"analyzer": {
  "productSearchAnalyzer": {
    "type": "custom",
    "tokenizer": "standard",
    "filter": ["lowercase", "tokenFilterNgram"]
  }

I have a search string "serv" and it should return "Print Servers". However, it does not return any hits.

Following is my analyze query: GET productsearch_new/_analyze

{
  "analyzer": "productSearchAnalyzer", 
  "text": [
            "Print Servers"
          ]
}

results:

{
  "tokens" : [
    {
      "token" : "p",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "pr",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "pri",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "prin",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "print",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "s",
      "start_offset" : 6,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "se",
      "start_offset" : 6,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "ser",
      "start_offset" : 6,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "serv",
      "start_offset" : 6,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "serve",
      "start_offset" : 6,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "server",
      "start_offset" : 6,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "servers",
      "start_offset" : 6,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

It is creating a "serv" token. Finally, this is my query:

GET productsearch_new/_search
{"query":{"match":{"categoryName":{"query":"Serv","operator":"OR","analyzer":"productSearchAnalyzer","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":false,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":false,"boost":1.0}}},"_source":{"includes":["categoryName"],"excludes":[]}}

Please let me know why "Print Servers" are not being returned from the above query with search keyword "serv"

CodePudding user response:

You are trying to use NGram analyzer at query time and your field is defined with standard analyzer (considering you have not given field mapping). Hence it is not working.

If you want to do prefix match, then you can use prefix query without analyzing as shown below:

{
  "query": {
    "prefix": {
      "categoryName": {
        "value": "serv"
      }
    }
  }
}

Otherwise, you need to define categoryName field with index time NGram analayzer as shown below:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "productSearchAnalyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "tokenFilterNgram"
          ]
        }
      },
      "filter": {
        "tokenFilterNgram": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 20
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "categoryName":{
        "type": "text",
        "analyzer": "productSearchAnalyzer"
      }
    }
  }
}

Once you define index mapping as above then you can use below query:

{
  "query": {
    "match": {
      "categoryName": "serv"
    }
  }
}

CodePudding user response:

in my elasticsearch configuration, I had the following Bean

@Bean
public ElasticsearchOperations elasticsearchTemplate() {
    return new ElasticsearchRestTemplate(client());
}

this was causing the issue. Once removed, the index was created with the correct custom analyzer.

  • Related