Home > other >  Elasticsearch analyzers in index settings has no affect
Elasticsearch analyzers in index settings has no affect

Time:09-14

I'm trying to set an analyzer to an index in its settings.

// PUT /customers 
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "title": {
        "type": "text"
      }
    },
    "dynamic": false
  },
  "settings": {
    "analysis": {
      "analyzer": "ik_smart"
    }
  }
}

Then I indexed some data.

// POST /customers/_doc
{
  "name": "张三",
  "title": "工程师"
}

// POST /customers/_doc
{
  "name": "李四",
  "title": "测试员"
}

Analyze with the ik_smart analyzer

//GET /customers/_analyze
{
  "text": "李四工程师",
  "analyzer": "ik_smart"
}

// gets tokens ['李四', '工程师']

And with the default analyzer

// GET /customers/_analyze
{
  "text": "李四工程师"
}

// gets tokens ['李', '四', '工', '程', '师']

Finally search for '李四工程师'

// GET /customers/_search
{
  "query": {
    "bool": {
      "must": {
        "multi_match": {
          "query": "李四工程师",
          "operator": "or",
          "type": "cross_fields",
          "fields": [
            "name^10",
            "title^7"
          ],
          "analyzer": "ik_smart"
        }
      }
    }
  }
}

// gets empty hits

If I put the ik_smart analyzer into each field

// PUT /customers 
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "ik_smart"
      },
      "title": {
        "type": "text",
         "analyzer": "ik_smart"
      }
    },
    "dynamic": false
  }
}

Then the request works fine.

I suspect that the first settings does not apply the analyzer to any fields within the index.

I use the ik_smart plugin here, which provides an ik_smart analyzer, because Chinese language does not have spaces to separate characters, ik_smart is a dictionary-based analyzer. Without the ik_smart analyzer, any data containing Chinese words or sentences would be indexed as single characters. So it works when I use the default analyzer in searching as well, because the phrase 李四工程师 breaks into ['李', '四', '工', '程', '师'], and matches the indexed data. But that does not provide a very accurate relevancy.

If I use the ik_smart analyzer in searching, I get tokens ['李四', '工程师'], which do not match the indexed data.

So, why isn't the settings.analysis.analyzer works as expected?

What is the use of this setting anyway if it does not have any affects?

CodePudding user response:

If you want ik_smart to be the default analyzer of your index, you can set it using default param in the analyzer definition as explained in this official doc.

PUT my-index-000001
{
  "settings": {
    "analysis": {
      "analyzer": {
        "default": { --> Note this
          "type": "simple"
        }
      }
    }
  }
}
  • Related