Home > Software design >  Liferay portal 7.3.7 case insensitive, diacritics free with ElasticSearch
Liferay portal 7.3.7 case insensitive, diacritics free with ElasticSearch

Time:05-05

I am having a dilema on liferay portal 7.3.7 with case insensitive and diacritis free search through elasticsearch in JournalArticles with custom ddm fields. Liferay generated fieldmappings in Configuration->Search like this:

    ...
    },
    "localized_name_sk_SK_sortable" : {
      "store" : true,
      "type" : "keyword"
    },
    ...

I would like to have these *_sortable fields usable for case insensitive and dia free searching, so I tried to add analyzer and normalizer to liferay search advanced configuration in System Settings->Search->Elasticsearch 7 like this:

{  
  "analysis":{  
    "analyzer":{
        "ascii_analyzer":{  
            "tokenizer": "standard",
            "filter":["asciifolding","lowercase"]
        }
    },
    "normalizer": {
        "ascii_normalizer": {
            "type": "custom",
            "char_filter": [],
            "filter": ["lowercase", "asciifolding"]
        }
    }
  }
}

After that, I overrided mapping for template_string_sortable:

     {
      "template_string_sortable" : {
        "mapping" : {
          "analyzer": "ascii_analyzer",
          "normalizer": "ascii_normalizer",
          "store" : true,
          "type" : "keyword"
        },
        "match_mapping_type" : "string",
        "match" : "*_sortable"
      }
    }

After reindexing, my sortable fields looks like this:

    ...
    },
    "localized_name_sk_SK_sortable" : {
      "normalizer" : "ascii_normalizer",
      "store" : true,
      "type" : "keyword"
    },
    ...

Next, I try to create new content for my ddm structure, but all my sortable fields looks same, like this:

 "localized_title_sk_SK": "test diakrity časť 1 ľščťžýáíéôň title",
 "localized_title_sk_SK_sortable": "test diakrity časť 1 ľščťžýáíéôň title",

but I need that sortable field without national characters, so i.e. I can find by "cast 1" through wildcardQuery in localized_title_sk_SK_sortable and so on... THX for any advice (maybe I just have wrong appearance to whole problem? And I am really new to ES)

CodePudding user response:

First of all it would be better to apply original_ascii_folding and then lowercase filter, but keep in mind this filter are for search and your _source data wouldn't be changed because you applied analyzer on the field.

If you need to manipulate the data before ingesting it you can use Ingest pipeline feature in Elasticsearch for more information check here.

  • Related