Home > Net >  ElasticSearch sort without whitespace and case sensitivity
ElasticSearch sort without whitespace and case sensitivity

Time:01-26

I have user information in ElasticSearch, I want to query people and sort results by last name, when I do it by running

{
  "size": 10,
  "query": {
    "bool": {
      "adjust_pure_negative": true,
      "boost": 1.0
    }
  },
  "sort": [{ "last_name": { "order": "desc" } }]
}

I get results that start with whitespace and that start with lowercase letters first. I want to do a case insensitive sort, and ignore whitespace when sorting.

For example, part of the output is

        {
            "first_name": "test",
            "last_name": "test"
        },
        {
            "first_name": "name",
            "last_name": "mangina"
        },
        {
            "first_name": "Nona",
            "last_name": "Zucker"
        }

I expected Z to be first for descending order.

CodePudding user response:

Because the keyword data type, i.e., your field values, is not analyzed, when ES compares the first character of "t" for "test" (ASCII value 116) and the first character of "Z" for "Zucker" (ASCII value 90), Because of the ASCII value, "t" has a higher rank here, and "test" is the first record in the result.

You must include your own analyzer. you can do the following way.

Mappings:

PUT username
{
  "settings": {
    "analysis": {
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "filter": [
            "lowercase" // to treat uppercase and lowercase as being the same
          ],
          "char_filter": [
            "alphabets_char_filter" // to ignore everything except letters while sorting
          ]
        }
      },
      "char_filter": {
        "alphabets_char_filter": {
          "type": "pattern_replace",
          "pattern": "[^a-zA-Z]",
          "replacement": ""
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "first_name": {
        "type": "keyword"
      },
      "last_name": {
        "type": "keyword",
        "normalizer": "my_normalizer"
      }
    }
  }
}

Insert Documents:

PUT username/_doc/1
{
  "first_name": "test",
  "last_name": "test"
}

PUT username/_doc/2
{
  "first_name": "name",
  "last_name": "mangina"
}

PUT username/_doc/3
{
  "first_name": "Nona",
  "last_name": "Zucker"
}

PUT username/_doc/4
{
  "first_name": "Nona",
  "last_name": " Xucker"
}

Query:

GET username/_search
{
  "size": 10,
  "query": {
    "bool": {
      "adjust_pure_negative": true,
      "boost": 1.0
    }
  },
  "sort": [{ "last_name": { "order": "desc" } }]
}

Output:

    {
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [
      {
        "_index": "username",
        "_id": "3",
        "_score": null,
        "_source": {
          "first_name": "Nona",
          "last_name": "Zucker"
        },
        "sort": [
          "zucker"
        ]
      },
      {
        "_index": "username",
        "_id": "4",
        "_score": null,
        "_source": {
          "first_name": "Nona",
          "last_name": " Xucker"
        },
        "sort": [
          "xucker"
        ]
      },
      {
        "_index": "username",
        "_id": "1",
        "_score": null,
        "_source": {
          "first_name": "test",
          "last_name": "test"
        },
        "sort": [
          "test"
        ]
      },
      {
        "_index": "username",
        "_id": "2",
        "_score": null,
        "_source": {
          "first_name": "name",
          "last_name": "mangina"
        },
        "sort": [
          "mangina"
        ]
      }
    ]
  }
}
  • Related