My dict is below
stopwords.txt
have all the stopwords
and synonym.txt
have football,soccer
abc = [
{'id':1, 'name': 'christiano ronaldo', 'description': '[email protected]', 'type': 'football'},
{'id':2, 'name': 'lionel messi', 'description': '[email protected]','type': 'soccer'},
{'id':3, 'name': 'sachin', 'description': 'was', 'type': 'cricket'}
]
- I have two txt files
stopwords.txt
andsynonym.txt
- If I am searching stopwords then those document should not return
- I need to apply settings on
name
anddescription
resp = es.search(index="players",body={
"query": {
"query_string": {
"fields": ["name^2","description^2"],
"query": "was football*"
}
}})
My out
{'took': 17,
'timed_out': False,
'_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
'hits': {'total': {'value': 2, 'relation': 'eq'},
'max_score': 2.345461,
'hits': [{'_index': 'players',
'_type': '_doc',
'_id': '3',
'_score': 2.345461,
'_source': {'id': 3,
'name': 'sachin',
'description': 'was',
'type': 'cricket'}},
{'_index': 'players',
'_type': '_doc',
'_id': '1',
'_score': 2.0,
'_source': {'id': 1,
'name': 'christiano ronaldo',
'description': '[email protected]',
'type': 'football'}}]}}
Expected out is below
{'took': 2,
'timed_out': False,
'_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
'hits': {'total': {'value': 2, 'relation': 'eq'},
'max_score': 2.0,
'hits': [{'_index': 'players',
'_type': '_doc',
'_id': '1',
'_score': 2.0,
'_source': {'id': 1,
'name': 'christiano ronaldo',
'description': '[email protected]',
'type': 'football'}},
{'_index': 'players',
'_type': '_doc',
'_id': '2',
'_score': 2.0,
'_source': {'id': 2,
'name': 'lionel messi',
'description': '[email protected]',
'type': 'soccer'}}]}}
CodePudding user response:
Basically you need to define both synonym and stop words, instead of defining it in a file, you can also pass the synonym in the settings itself that way changing it will be much easy, you just need to close the index, update the synonym list and again open the index.
Also, if you are using english
content, Elasticsearch also have a default stop worlds list, which has was
already, you can take a look at all the default stop words here
Now use below setting and mapping to create your index
{
"settings": {
"index": {
"analysis": {
"filter": {
"synonym_en": {
"type": "synonym_graph",
"synonyms": [
"football, soccer"
]
},
"english_stop": {
"type": "stop",
"stopwords": "_english_"
}
},
"analyzer": {
"english_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"english_stop",
"synonym_en"
]
}
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "english_analyzer"
},
"description": {
"type": "text",
"analyzer": "english_analyzer"
}
}
}
}
And after that use same query, now you should be able to get your expected results.
CodePudding user response:
You can use the below index mapping, to include multiple files in stopwords_path
, and then use the custom analyzer on both name
and description
fields.
{
"settings": {
"analysis": {
"analyzer": {
"stop-analyzer": {
"tokenizer": "whitespace",
"filter": [
"stop_words_1",
"stop_words_2"
]
}
},
"filter": {
"stop_words_1": {
"type": "stop",
"stopwords_path": "stopwords.txt"
},
"stop_words_2": {
"type": "stop",
"stopwords_path": "synonym.txt"
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "stop-analyzer"
},
"description": {
"type": "text",
"analyzer": "stop-analyzer"
}
}
}
}