Home > Software engineering >  Search mix of characters and numbers with edge n-gram on ElasticSearch
Search mix of characters and numbers with edge n-gram on ElasticSearch

Time:08-26

I'm trying to create an autocomplete with edge n-gram but when I search for a string with numbers the search doesn't return anything.

For example, I want to search for 10k and I have several items with 10kab. If I search for "10kab" or "kab" it works fine, but if I only search for "10k" it doesn't return anything. If it doesn't have numbers it works fine. For example, searching for "som" I get results for "something".

This is the index settings I have:

{
    "settings": {
        "analysis": {
            "analyzer": {
                "autocomplete": {
                    "tokenizer": "autocomplete",
                    "filter": [
                        "lowercase"
                    ]
                },
                "autocomplete_search": {
                    "tokenizer": "lowercase"
                }
            },
            "tokenizer": {
                "autocomplete": {
                    "type": "edge_ngram",
                    "min_gram": 2,
                    "max_gram": 10,
                    "token_chars": [
                        "letter"
                    ]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "search": {
                    "type": "text",
                    "fields": {
                            "keyword": {
                                    "type": "keyword",
                                    "ignore_above": 256
                            }
                    },
                    "analyzer": "autocomplete",
                    "search_analyzer": "autocomplete_search"
            }
        }
    }
}

I even tried to add digit to the token_chars but it seems to work worst as not even "10kab" works.

CodePudding user response:

Your autocomplete tokenizer is configured to only include letter, you should add digit as well

            "autocomplete": {
                "type": "edge_ngram",
                "min_gram": 2,
                "max_gram": 10,
                "token_chars": [
                    "letter",
                    "digit"              <--- add this
                ]
            }

Also note that your search_analyzer is based on the lowercase tokenizer which gets rid of digits, so searching for 10k will actually search just for k and that won't match anything.

You should change your autocomplete_search analyzer to the following and then your searches will start working:

    "autocomplete_search": {
      "tokenizer": "keyword",
      "filter": [
        "lowercase"
      ]
    }
  • Related