I am new in Elastic Search. I would like to apply any analyser that satisfy below search. Lets take an example. Suppose I have entered below text in a document
- I am walking now
- I walked to Ahmedabad
- Everyday I walk in the morning
- Anil walks in the evening.
- I am hiring candidates
- I hired candidates
- Everyday I hire candidates
- He hires candidates
Now when I search with
- text "walking" result should be [walking, walked, walk, walks]
- text "walked" result should be [walking, walked, walk, walks]
- text "walk" result should be [walking, walked, walk, walks]
- text "walks" result should be [walking, walked, walk, walks]
Same result should also for hire.
- text "hiring" result should be [hiring, hired, hire, hires]
- text "hired" result should be [hiring, hired, hire, hires]
- text "hire" result should be [hiring, hired, hire, hires]
- text "hires" result should be [hiring, hired, hire, hires]
Thank You,
CodePudding user response:
What you are searching for is a language analyzer, see the documentation here
An Word anaylzer always consists of an word-tokenizer and a word-filter as the example below shows.
PUT /english_example
{
"settings": {
"analysis": {
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"english_keywords": {
"type": "keyword_marker",
"keywords": ["example"]
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
}
},
"analyzer": {
"rebuilt_english": {
"tokenizer": "standard",
"filter": [
"english_possessive_stemmer",
"lowercase",
"english_stop",
"english_keywords",
"english_stemmer"
]
}
}
}
}
}
You can now use the analyzer in your index-mapping like this:
{ mappings": {
"myindex": {
"properties": {
"myField": {
"type": "keyword",
"analyzer": "rebuilt_english"
}
}
}
}
}
Remember to use a match query in order to query full-text.
CodePudding user response:
You need to use stemmer token filter
Stemming is the process of reducing a word to its root form. This ensures variants of a word match during a search.
For example, walking and walked can be stemmed to the same root word: walk. Once stemmed, an occurrence of either word would match the other in a search.
Mapping
PUT index36
{
"mappings": {
"properties": {
"title":{
"type": "text",
"analyzer": "my_analyzer"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "whitespace",
"filter": [ "stemmer" ,"lowercase"]
}
}
}
}
}
Analyze
GET index36/_analyze
{
"text": ["walking", "walked", "walk", "walks"],
"analyzer": "my_analyzer"
}
Result
{
"tokens" : [
{
"token" : "walk",
"start_offset" : 0,
"end_offset" : 7,
"type" : "word",
"position" : 0
},
{
"token" : "walk",
"start_offset" : 8,
"end_offset" : 14,
"type" : "word",
"position" : 101
},
{
"token" : "walk",
"start_offset" : 15,
"end_offset" : 19,
"type" : "word",
"position" : 202
},
{
"token" : "walk",
"start_offset" : 20,
"end_offset" : 25,
"type" : "word",
"position" : 303
}
]
}
All the four words produce same token "walk". So any of these words would match the other in a search.