I have two file st.txt
, sy.txt
st.txt
was
an
sy.txt
football,soccer
Setting is below
new_player_settings = {
"settings": {
"index": {
"analysis": {
"filter": {
"synonym_en": {
"type": "synonym",
"synonyms_path": "sy.txt"
},
"english_stop": {
"type": "stop",
"stopwords_path": "st.txt"
}
},
"analyzer": {
"english_analyzer": {
"tokenizer": "standard",
"filter": [
"english_stop",
"synonym_en"
]
}
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "english_analyzer"
},
"description": {
"type": "text",
"analyzer": "english_analyzer"
}
}
}
}
myd is below
abc = [
{'id':1, 'name': 'christiano ronaldo', 'description': '[email protected]', 'type': 'football'},
{'id':2, 'name': 'lionel messi', 'description': '[email protected]','type': 'soccer'},
{'id':3, 'name': 'sachin', 'description': 'was', 'type': 'cricket'}
]
DSL query is below
{
"query": {
"query_string": {
"fields": ["name^2","description^2","type^4"],
"query": "was football"
}
}}
My Output
{'took': 2,
'timed_out': False,
'_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
'hits': {'total': {'value': 2, 'relation': 'eq'},
'max_score': 3.9233165,
'hits': [{'_index': 'newplayers',
'_type': '_doc',
'_id': '1',
'_score': 3.9233165,
'_source': {'id': 1,
'name': 'christiano ronaldo',
'description': '[email protected]',
'type': 'football'}},
{'_index': 'newplayers',
'_type': '_doc',
'_id': '3',
'_score': 2.345461,
'_source': {'id': 3,
'name': 'sachin',
'description': 'was',
'type': 'cricket'}}]}}
Expected out
id 3 should not present since stopword `was` present, id 2 should present because in synonym football=stopwords
Expected
{'took': 2,
'timed_out': False,
'_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
'hits': {'total': {'value': 2, 'relation': 'eq'},
'max_score': 2.0,
'hits': [{'_index': 'players',
'_type': '_doc',
'_id': '1',
'_score': 2.0,
'_source': {'id': 1,
'name': 'christiano ronaldo',
'description': '[email protected]',
'type': 'football'}},
{'_index': 'players',
'_type': '_doc',
'_id': '2',
'_score': 2.0,
'_source': {'id': 2,
'name': 'lionel messi',
'description': '[email protected]',
'type': 'soccer'}}]}}
CodePudding user response:
Maybe issue is that sy
and st
text files which defines your index stop and synonyms are not present in the Elasticsearch cluster, but I tried with same settings and mappings and the sample data you provided and I was able to get your expected output, as shown below.
Search query
{
"query": {
"query_string": {
"fields": [
"name^2",
"description^2",
"type^4"
],
"query": "was football"
}
}
}
And search result with source JSON
"hits": [
{
"_index": "72796944",
"_type": "_doc",
"_id": "1",
"_score": 0.4051987,
"_source": {
"name": "christiano ronaldo",
"description": "[email protected]"
}
},
{
"_index": "72796944",
"_type": "_doc",
"_id": "2",
"_score": 0.4051987,
"_source": {
"name": "lionel messi",
"description": "[email protected]"
}
}
]
Would be great if you can share the output of explain API, which you can get by appending the ?explain=true
in your search endpoint, to debug further
Update: As discussed in the comment,issue is not happening when these words are defined in the setting itself, so its issue is that file content is not being updated properly in Elasticsearch.