I'm migrating from Elasticsearch 1.5
to 7.10
there are multiple required changes, the most relevant one is the removal of the document type concept in version 6, to deal with it I introduced a new field doc_type
and then I match with it when I search.
My question is, when I make the same (or equivalent because there are some changes) search query should I expect to have the exact same result set? Because I'm having some differences, so I would like to figure out if I broke something in the new mappings or in the search query.
Thank you in advance
Edit after first question:
In general: I have a service that communicates with ES 1.5
and I have to migrate it to ES 7.10
keeping the external API as stable as possible.
- I'm not using scoring.
- Previously I had document types
A
andB
, when I make a query like this for example:host/indexname/A,B/_search
, after the migration I keepA
orB
indoc_type
, and the query becomeshost/indexname/_search
with a"bool":{"should":[{"terms":{"doc_type":["A"],"boost":1.0}},{"terms":{"doc_type":["B"],"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}
in the body. If I put it in different indexes forA
andB
and the user want to match in both of them I'll have to "merge" the search response for both queries and I don't know which strategy should I follow for that, so keeping it all together I get a response with mixed (doc_type
) results from ES. I followed this specific approach https://www.elastic.co/blog/removal-of-mapping-types-elasticsearch#custom-type-field - The differences are not so big, difficult to show a concrete example because it's a complex data/doc structure but the idea is, having for
1.5
this response for a giving query for example:[a, b, c, d, e, f, g, h, i, j]
(where each one may have any of typesA
orB
) With 7.10 I'm having responses like:[a, b, e, c, d, f, g, h, i, j]
or[a, b, c, d, e, g, i, j, k]
Second edit: This query has been generated from the java client.
{
"from":0,
"size":100,
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"mark_deleted:false",
"fields":[
],
"type":"best_fields",
"default_operator":"or",
"max_determinized_states":10000,
"enable_position_increments":true,
"fuzziness":"AUTO",
"fuzzy_prefix_length":0,
"fuzzy_max_expansions":50,
"phrase_slop":0,
"escape":false,
"auto_generate_synonyms_phrase_query":true,
"fuzzy_transpositions":true,
"boost":1.0
}
},
{
"bool":{
"should":[
{
"terms":{
"type":[
"A"
],
"boost":1.0
}
},
{
"terms":{
"type":[
"B"
],
"boost":1.0
}
},
{
"terms":{
"type":[
"D"
],
"boost":1.0
}
}
],
"adjust_pure_negative":true,
"boost":1.0
}
}
],
"adjust_pure_negative":true,
"boost":1.0
}
},
"post_filter":{
"term":{
"mark_deleted":{
"value":false,
"boost":1.0
}
}
},
"sort":[
{
"a_specific_date":{
"order":"desc"
}
}
],
"highlight":{
"pre_tags":[
"<b>"
],
"post_tags":[
"</b>"
],
"no_match_size":120,
"fields":{
"body":{
"fragment_size":120,
"number_of_fragments":1
}
}
}
}
CodePudding user response:
First, since you don't care about scoring you should use bool/filter
instead of bool/must
at the top level, otherwise your results are sorted by _score
by default and between 1.7 et 7.10, there have been so many changes that it would explain the differences you get. So you're better off simply sorting the results using any other field than _score
Second, instead of the bool/should
on type
you can use a simple terms
query, which does exactly the same job, yet in a simpler way:
{
"from": 0,
"size": 100,
"query": {
"bool": {
"filter": [
{
"query_string": {
"query": "mark_deleted:false",
"fields": [],
"type": "best_fields",
"default_operator": "or",
"max_determinized_states": 10000,
"enable_position_increments": true,
"fuzziness": "AUTO",
"fuzzy_prefix_length": 0,
"fuzzy_max_expansions": 50,
"phrase_slop": 0,
"escape": false,
"auto_generate_synonyms_phrase_query": true,
"fuzzy_transpositions": true,
"boost": 1
}
},
{
"terms": {
"type": [
"A",
"B",
"C"
]
}
}
]
}
},
"post_filter": {
"term": {
"mark_deleted": {
"value": false,
"boost": 1
}
}
},
"sort": [
{
"a_specific_date": {
"order": "desc"
}
}
],
"highlight": {
"pre_tags": [
"<b>"
],
"post_tags": [
"</b>"
],
"no_match_size": 120,
"fields": {
"body": {
"fragment_size": 120,
"number_of_fragments": 1
}
}
}
}
Finally, I'm not sure why you're using a query_string
query to do an exact match on mark_deleted:false
, it doesn't make sense to me. A simple term
query would be better and more adequate here.
Also not clear why you have remove all results that also have mark_deleted:false
in your post_filter
, since it's the same condition as in your query_string
constraint.