Home > Blockchain >  why do OpenSearch queries not return a relevant result
why do OpenSearch queries not return a relevant result

Time:11-19

I have an index containing city names. I try to correctly score my entries but i do not get the desired results. I have tried to create the index without any settings specified and with an edge-n-gram as well as an n-gram analyzer. The language of the city names is german and i read here, that this should be a fine analyzer. Here are the Settings that i tried for the analyzers:

{
    "settings": {
        "index": {
            "number_of_shards": "1",
            "number_of_replicas": "1"
        },
        "analysis": {
            "analyzer": {
                "e_ngram_token": {
                    "tokenizer": "edge_ngram_tokenizer"
                }
            },
            "tokenizer": {
                "edge_ngram_tokenizer": {
                    "type": "edge_ngram", // exchanged to ngram the other time
                    "min_gram": 2,
                    "max_gram": 10,
                    "token_chars": [
                        "letter",
                        "digit"
                    ]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "name": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            }
        }
    }
}

Here is some example data for a bulk creation (/cities/_bulk):

{ "create": {  } }
{"name": "Münster"}
{ "create": {  } }
{"name": "München"}
{ "create": {  } }
{"name": "Bad-Münster Fake 2"}
{ "create": {  } }
{"name": "Bad Münster Fake"}
{ "create": {  } }
{"name": "Munddort fake"}
{ "create": {  } }
{"name": "Stolpmünde"}
{ "create": {  } }
{"name": "Swinemünde"}
{ "create": {  } }
{"name": "Dortmund"}
{ "create": {  } }
{"name": "Müden (Mosel)"}
{ "create": {  } }
{"name": "Mannheim"}
{ "create": {  } }
{"name": "Marburg"}
{ "create": {  } }
{"name": "Magdeburg"}
{ "create": {  } }
{"name": "Montreux"}
{ "create": {  } }
{"name": "Sankt Moritz"}

so when i run a query like this:

{
    "from": 0,
    "size": 100,
    "query": {
        "match": {
            "name": {
                "query": "mun",
                "analyzer": "e_ngram_token",
                "fuzziness": "2",
                "fuzzy_transpositions": true,
                "operator":  "or",
                "max_expansions": 50,
                "boost": 5
            }
        }
    }
}

I would expect to get cities like "München", "Münster" and so on, basically every city with "mun" or, because of the fuzziness, cities with "mün", "man", "tan" and so on. What i get is this:

{
    "took": 10,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.0,
        "hits": [
            {
                "_index": "cities",
                "_type": "_doc",
                "_id": "7jX2ioQBc3BSm-EXMB2V",
                "_score": 0.0,
                "_source": {
                    "name": "Bad-Münster Fake 2"
                }
            }
        ]
    }
}

Can somebody explain to me what i am missing? In my Understanding the tokens are created at index time and will be something like `["Mü", "ün", "nc"..."Mün"] for "München". because i request a fuzziness of 2 the term "mun" should match the token "mün" and thus hand back the result.

Thanks a lot!

CodePudding user response:

You must add analyzer in field.

           "name": {
                "type": "text",
                "analyzer": "e_ngram_token" <----------, 
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            }
  • Related