Home > database >  Elastic Search - Apply appropriate analyser to accurate result
Elastic Search - Apply appropriate analyser to accurate result

Time:09-17

I am new in Elastic Search. I would like to apply any analyser that satisfy below search. Lets take an example. Suppose I have entered below text in a document

  1. I am walking now
  2. I walked to Ahmedabad
  3. Everyday I walk in the morning
  4. Anil walks in the evening.
  5. I am hiring candidates
  6. I hired candidates
  7. Everyday I hire candidates
  8. He hires candidates

Now when I search with

  1. text "walking" result should be [walking, walked, walk, walks]
  2. text "walked" result should be [walking, walked, walk, walks]
  3. text "walk" result should be [walking, walked, walk, walks]
  4. text "walks" result should be [walking, walked, walk, walks]

Same result should also for hire.

  1. text "hiring" result should be [hiring, hired, hire, hires]
  2. text "hired" result should be [hiring, hired, hire, hires]
  3. text "hire" result should be [hiring, hired, hire, hires]
  4. text "hires" result should be [hiring, hired, hire, hires]

Thank You,

CodePudding user response:

What you are searching for is a language analyzer, see the documentation here

An Word anaylzer always consists of an word-tokenizer and a word-filter as the example below shows.

PUT /english_example
{
  "settings": {
    "analysis": {
      "filter": {
        "english_stop": {
          "type":       "stop",
          "stopwords":  "_english_" 
        },
        "english_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["example"] 
        },
        "english_stemmer": {
          "type":       "stemmer",
          "language":   "english"
        },
        "english_possessive_stemmer": {
          "type":       "stemmer",
          "language":   "possessive_english"
        }
      },
      "analyzer": {
        "rebuilt_english": {
          "tokenizer":  "standard",
          "filter": [
            "english_possessive_stemmer",
            "lowercase",
            "english_stop",
            "english_keywords",
            "english_stemmer"
          ]
        }
      }
    }
  }
}

You can now use the analyzer in your index-mapping like this:

{ mappings": {
        "myindex": {
            "properties": {
                "myField": {
                    "type": "keyword",
                    "analyzer": "rebuilt_english"
                }
            }
        }
    }
}

Remember to use a match query in order to query full-text.

CodePudding user response:

You need to use stemmer token filter

Stemming is the process of reducing a word to its root form. This ensures variants of a word match during a search.

For example, walking and walked can be stemmed to the same root word: walk. Once stemmed, an occurrence of either word would match the other in a search.

Mapping

PUT index36
{
  "mappings": {
    "properties": {
      "title":{
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }, 
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "whitespace",
          "filter": [ "stemmer" ,"lowercase"]
        }
      }
    }
  }
}

Analyze

GET index36/_analyze
{
  "text": ["walking", "walked", "walk", "walks"],
  "analyzer": "my_analyzer"
}

Result

{
  "tokens" : [
    {
      "token" : "walk",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "walk",
      "start_offset" : 8,
      "end_offset" : 14,
      "type" : "word",
      "position" : 101
    },
    {
      "token" : "walk",
      "start_offset" : 15,
      "end_offset" : 19,
      "type" : "word",
      "position" : 202
    },
    {
      "token" : "walk",
      "start_offset" : 20,
      "end_offset" : 25,
      "type" : "word",
      "position" : 303
    }
  ]
}

All the four words produce same token "walk". So any of these words would match the other in a search.

  • Related