Home > Mobile >  ElasticSearch lowercase token filter not working
ElasticSearch lowercase token filter not working

Time:09-15

still pretty new to Elastic however I though that below example should work. Maybe you can point out what I'm doing wrong.

I'm trying to use a tokenizer with regex on a string and then lowercase it. Since I can't run two tokenizers on the same field without using a multifield I thought of using a token filter.

Example below:

PUT test-index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "pattern",
          "pattern": "(?<!^)(?=[A-Z])"
        }
      },
      "filter" : ["lowercase"]
    }
  }
}

and to test it

POST test-index/_analyze
{
  "analyzer": "my_analyzer",
  "text": "SomeSideCar.jpg"
}

Now I would expect to get: [some , side ,car.jpg] since the Regex splits on Uppercase letters and the tokenfilter should then lowercase the tokens.

However this is what I'm getting after running the above:

{
  "tokens": [
    {
      "token": "Some",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 0
    },
    {
      "token": "Side",
      "start_offset": 4,
      "end_offset": 8,
      "type": "word",
      "position": 1
    },
    {
      "token": "Car.jpg",
      "start_offset": 8,
      "end_offset": 15,
      "type": "word",
      "position": 2
    }
  ]
}

CodePudding user response:

Great start, you're almost there!!

You need to do it this way instead otherwise your custom analyzer won't use the lowercase filter:

PUT test-index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "my_tokenizer",
          "filter": [
            "lowercase"
          ]
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "pattern",
          "pattern": "(?<!^)(?=[A-Z])"
        }
      }
    }
  }
}
  • Related