Home > OS >  Keyword normalizer not applied on document
Keyword normalizer not applied on document

Time:08-09

I'm using Elasticsearch 6.8 here is my mapping

{
  "index_patterns": [
    "my_index_*"
  ],
  "settings": {
    "index": {
      "number_of_shards": 3,
      "number_of_replicas": 1
    },
    "analysis": {
      "analyzer": {
        "lower_ascii_analyzer": {
          "tokenizer": "keyword",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      },
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": [],
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "audit_conformity": {
      "dynamic": "false",
      "properties": {
        "country": {
          "type": "keyword",
          "normalizer": "my_normalizer"
        },
[…]

Then I post a document with this body

{
  "_source": {
    "company_id": "a813bec1-f9f3-44c7-96ac-11157f64b79b",
    "country": "MX",
    "user_entity_id": "1"
  }
}

When I search for the document, the country is still capitalized

GET /my_index_country/_search

I get

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index_country",
        "_type": "my_index",
        "_id": "LOT0fYIBCNP9gFG_7cet",
        "_score": 1,
        "_source": {
          "_source": {
            "company_id": "a813bec1-f9f3-44c7-96ac-11157f64b79b",
            "country": "MX",
            "user_entity_id": "1",
          }
        }
      }
    ]
  }
}

What do I do wrong ?

CodePudding user response:

You do nothing wrong, but normalizers (and analyzer alike) will never modify your source document, only whatever is indexed from it.

This means that the source document keeps holding MX but underneath mx will be indexed for the country field.

If you want to lowercase the country field, you should use an ingest pipeline with a lowercase processor instead which will modify your source document before indexing it:

PUT _ingest/pipeline/lowercase-pipiline
{
  "processors": [
    {
      "lowercase": {
        "field": "country"
      }
    }
  ]
}

Then use it when indexing your documents:

PUT my_index_country/my_index/LOT0fYIBCNP9gFG_7cet?pipeline=lowercase-pipeline
{
    "company_id": "a813bec1-f9f3-44c7-96ac-11157f64b79b",
    "country": "MX",
    "user_entity_id": "1",
}

GET my_index_country/my_index/LOT0fYIBCNP9gFG_7cet

Result =>
{
    "company_id": "a813bec1-f9f3-44c7-96ac-11157f64b79b",
    "country": "mx",
    "user_entity_id": "1",
}
  • Related