I'm trying to set an analyzer to an index in its settings.
// PUT /customers
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"title": {
"type": "text"
}
},
"dynamic": false
},
"settings": {
"analysis": {
"analyzer": "ik_smart"
}
}
}
Then I indexed some data.
// POST /customers/_doc
{
"name": "张三",
"title": "工程师"
}
// POST /customers/_doc
{
"name": "李四",
"title": "测试员"
}
Analyze with the ik_smart
analyzer
//GET /customers/_analyze
{
"text": "李四工程师",
"analyzer": "ik_smart"
}
// gets tokens ['李四', '工程师']
And with the default analyzer
// GET /customers/_analyze
{
"text": "李四工程师"
}
// gets tokens ['李', '四', '工', '程', '师']
Finally search for '李四工程师'
// GET /customers/_search
{
"query": {
"bool": {
"must": {
"multi_match": {
"query": "李四工程师",
"operator": "or",
"type": "cross_fields",
"fields": [
"name^10",
"title^7"
],
"analyzer": "ik_smart"
}
}
}
}
}
// gets empty hits
If I put the ik_smart
analyzer into each field
// PUT /customers
{
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "ik_smart"
},
"title": {
"type": "text",
"analyzer": "ik_smart"
}
},
"dynamic": false
}
}
Then the request works fine.
I suspect that the first settings does not apply the analyzer to any fields within the index.
I use the ik_smart
plugin here, which provides an ik_smart
analyzer, because Chinese language does not have spaces to separate characters, ik_smart
is a dictionary-based analyzer. Without the ik_smart
analyzer, any data containing Chinese words or sentences would be indexed as single characters. So it works when I use the default analyzer in searching as well, because the phrase 李四工程师
breaks into ['李', '四', '工', '程', '师']
, and matches the indexed data. But that does not provide a very accurate relevancy.
If I use the ik_smart
analyzer in searching, I get tokens ['李四', '工程师']
, which do not match the indexed data.
So, why isn't the settings.analysis.analyzer
works as expected?
What is the use of this setting anyway if it does not have any affects?
CodePudding user response:
If you want ik_smart
to be the default analyzer of your index, you can set it using default
param in the analyzer definition as explained in this official doc.
PUT my-index-000001
{
"settings": {
"analysis": {
"analyzer": {
"default": { --> Note this
"type": "simple"
}
}
}
}
}