Home > Back-end >  Why search ignores synonyms?
Why search ignores synonyms?

Time:04-06

I want to search for a phrase and get all results (including synonym results).

I configured my index as follow:

         "settings": {
            "index": {
                "analysis": {
                    "filter": {
                        "my_graph_synonyms": {
                            "type": "synonym_graph",
                            "synonyms": [
                                "Cosmos, Universe",
                            ]
                        }
                    },
                    "analyzer": {
                        "my_search_time_analyzer": {
                            "tokenizer": "standard",
                            "filter": [
                                "lowercase",
                                "stemmer",
                                "my_graph_synonyms"
                            ]
                        }
                    }
                }
            }
        },
        "mappings": {
            "properties": {
                "content": {
                    "type": "text",
                    "analyzer": "standard",
                    "search_analyzer": "my_search_time_analyzer"
                }
            }
        }

I added 2 documents to the index:

PUT demo_idx/_doc/1
{
  "content": "Cosmos A Spacetime Odyssey is a 2014 American science documentary television series."
}

PUT demo_idx/_doc/2
{
  "content": "Universe A Spacetime Odyssey is a 2014 American science documentary television series."
}

I run the following search:

"query": {
            "bool": {
                "must":
                    [{
                        "match": {
                            "content": {
                                "query": "Cosmos",
                            }
                        }
                    }]
            }
        } 

I expected to get 2 results (following the synonyms) but I got only one.

How can I run the search query (while using the synonyms) and get 2 results ?

CodePudding user response:

This is happening due to the stemmer filter, if you remove this and index your data again, it will return you both documents,

You can use the analyze API to check the tokens generated by your analyzer, and you can see for Cosmos it generates below tokens

{
    "tokens": [
        {
            "token": "univers", // Note this
            "start_offset": 0,
            "end_offset": 6,
            "type": "SYNONYM",
            "position": 0
        },
        {
            "token": "cosmo",
            "start_offset": 0,
            "end_offset": 6,
            "type": "<ALPHANUM>",
            "position": 0
        }
    ]
}

While standard tokenizer which is used at index time, creates Universe for Universe without stemming it, hence it doesn't match your search terms generated by search_analyzer.

  • Related