Home > Enterprise >  Elastic matchQuery for name
Elastic matchQuery for name

Time:05-26

I have a elastic field which contains a user name, eg. my name would contain vojtech knyttl.

I am trying to create a match query to be able to find my name by phrases like:

  • vojtech k
  • vojtech kny
  • knyttl

My query:

      {
        "bool" : {
          "should" : [
            {
              "match" : {
                "keywords" : {
                  "query" : "vojtech kn",
                  "operator" : "AND",
                  "prefix_length" : 0,
                  "max_expansions" : 50,
                  "minimum_should_match" : "50%",
                }
              }
            }
          ]
        }
      }

the problem is that vojtech and vojtech kn will not find anything because of the AND operator. If I switch to OR, searching for vojtech knyttl will actually select every vojtech in the database and my last name will not be even at the top result.

How should the query be formed for such search?

CodePudding user response:

I think edge_ngrams should work in this case. Please try the following:

Set the index to use edge_ngrams as index analyzer.

PUT test 
{
  "mappings": {
    "properties": {
      "name":{
        "type": "text",
        "analyzer": "custom_analyzer",
        "search_analyzer": "standard"
      }
    }
  },
  "settings": {
    "analysis": {
      "filter": {
        "custom_edge_ngram":
        {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 10
        }
      },
      "analyzer": {
          "custom_analyzer":{
            "tokenizer":"standard",
            "filter":[
              "lowercase",
              "custom_edge_ngram"]
          }
      }
    }
  }
}

Then query the index using the following query:

GET test/_search
{
  "query": {
    "match": {
      "name": 
      {
        "query":"vojtech k",
        "operator": "and"
      }
    }
  }
}

EXPLAINATION:

The edge ngram index analyzer will generate ngrams with min length 1 and max 10 for your name field in the index

You can check the tokens using this:

GET test/_analyze
{
  "analyzer": "custom_analyzer",
  "text": ["vojtech knyttl"]
}

When you search using the match query it will find vojtech and k as ngrams in your doc.

My test cases vojtech adams, vojtech knyttl, vojtech, joe knyttl

If I search vojtech knyttl it returns 1 result.

If I search vojtech I get vojtech adams, vojtech knyttl, vojtech

If I search vojteck k I get vojtech knyttl

If I search knyttl I get vojtech knyttl, joe knyttl

CodePudding user response:

You should use the match phrase prefix query, tested it on below samples and seems to be working fine according to your use-case.

Sample documents

{
    "name" :  "vojtech knyttl"
}

{
    "name" :  "vojtech"
    
}

{
    "name" :  "vojtech kn"

}

Search query using match phrase prefix

{
  "query": {
    "match_phrase_prefix": {
      "name": {
        "query": "vojtech k"
      }
    }
  }
}
  • Related