Home > Blockchain >  Elasticsearch DSL search query not applied on text files while searching
Elasticsearch DSL search query not applied on text files while searching

Time:07-12

I have two file st.txt , sy.txt

st.txt

was
an

sy.txt

football,soccer

Setting is below

new_player_settings = {
    "settings": {
        "index": {
            "analysis": {
                "filter": {
                    "synonym_en": {
                        "type": "synonym",
                        "synonyms_path": "sy.txt"
                    },
                    "english_stop": {
                        "type": "stop",
                        "stopwords_path": "st.txt"
                    }
                },
                "analyzer": {
                    "english_analyzer": {
                        "tokenizer": "standard",
                        "filter": [
                            "english_stop",
                            "synonym_en"
                        ]
                    }
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "name": {
                "type": "text",
                "analyzer": "english_analyzer"
            },
            "description": {
                "type": "text",
                "analyzer": "english_analyzer"
            }
        }
    }
}

myd is below

abc = [
{'id':1, 'name': 'christiano ronaldo', 'description': '[email protected]', 'type': 'football'},
{'id':2, 'name': 'lionel messi', 'description': '[email protected]','type': 'soccer'},
{'id':3, 'name': 'sachin', 'description': 'was', 'type': 'cricket'}
]

DSL query is below

{
"query": {
"query_string": {
"fields": ["name^2","description^2","type^4"],
"query": "was football"
}
}}

My Output

{'took': 2,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 2, 'relation': 'eq'},
  'max_score': 3.9233165,
  'hits': [{'_index': 'newplayers',
    '_type': '_doc',
    '_id': '1',
    '_score': 3.9233165,
    '_source': {'id': 1,
     'name': 'christiano ronaldo',
     'description': '[email protected]',
     'type': 'football'}},
   {'_index': 'newplayers',
    '_type': '_doc',
    '_id': '3',
    '_score': 2.345461,
    '_source': {'id': 3,
     'name': 'sachin',
     'description': 'was',
     'type': 'cricket'}}]}}

Expected out

id 3 should not present since stopword `was` present, id 2 should present because in synonym football=stopwords

Expected

{'took': 2,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 2, 'relation': 'eq'},
  'max_score': 2.0,
  'hits': [{'_index': 'players',
    '_type': '_doc',
    '_id': '1',
    '_score': 2.0,
    '_source': {'id': 1,
     'name': 'christiano ronaldo',
     'description': '[email protected]',
     'type': 'football'}},
   {'_index': 'players',
    '_type': '_doc',
    '_id': '2',
    '_score': 2.0,
    '_source': {'id': 2,
     'name': 'lionel messi',
     'description': '[email protected]',
     'type': 'soccer'}}]}}

CodePudding user response:

Maybe issue is that sy and st text files which defines your index stop and synonyms are not present in the Elasticsearch cluster, but I tried with same settings and mappings and the sample data you provided and I was able to get your expected output, as shown below.

Search query

{
    "query": {
        "query_string": {
            "fields": [
                "name^2",
                "description^2",
                "type^4"
            ],
            "query": "was football"
        }
    }
}

And search result with source JSON

"hits": [
            {
                "_index": "72796944",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.4051987,
                "_source": {
                    "name": "christiano ronaldo",
                    "description": "[email protected]"
                }
            },
            {
                "_index": "72796944",
                "_type": "_doc",
                "_id": "2",
                "_score": 0.4051987,
                "_source": {
                    "name": "lionel messi",
                    "description": "[email protected]"
                }
            }
        ]

Would be great if you can share the output of explain API, which you can get by appending the ?explain=true in your search endpoint, to debug further

Update: As discussed in the comment,issue is not happening when these words are defined in the setting itself, so its issue is that file content is not being updated properly in Elasticsearch.

  • Related