Home > Mobile >  Order ElasticSearch results by percentage of nested field matches
Order ElasticSearch results by percentage of nested field matches

Time:01-04

I would like to order ElasticSearch query results based on the percentage of matches for a nested field.

For example, let's suppose I have an ElasticSearch index strucutured as follows:

{
    "properties": {
        "name": {
            "type": "text"
        },
        "jobs": {
            "type": "nested",
            "properties": {
                "id": {
                    "type": "long"
                }
            }
        }
    }
}

With the following documents:

{
    "name": "Alice",
    "jobs": [
        { "id": 1 },
        { "id": 2 },
        { "id": 3 },
        { "id": 4 }
    ]
}
{
    "name": "Bob",
    "jobs": [
        { "id": 1 },
        { "id": 2 },
        { "id": 3 }
    ]
}
{
    "name": "Charles",
    "jobs": [
        { "id": 2 },
        { "id": 3 }
    ]
}

Now, I would like to perform a query to find which documents have specific jobs, ordered by the percentage of matched jobs. For example:

  • Searching for jobs 1 and 2, I would expect the order to be:
    1. Bob (66% jobs matched)
    2. Alice (50% jobs matched)
    3. Charles (50% jobs matched)
  • Searching for jobs 2, I would expect the order to be:
    1. Charles (50% jobs matched)
    2. Bob (33% jobs matched)
    3. Alice (25% jobs matched)

So far, I'm using the following query, but it sorts by number of matches, not the percentage:

{
    "query": {
        "nested": {
            "path": "jobs",
            "query": {
                "bool": {
                    "should": [
                        {
                            "match": {
                                "jobs.id": "1"
                            }
                        },
                        {
                            "match": {
                                "jobs.id": "2"
                            }
                        }
                    ]
                }
            },
            "score_mode":"sum"
        }
    }
}

CodePudding user response:

script_score seems to do the job:

{
  "query": {
    "function_score": {
      "query": {
        "nested": {
          "path": "jobs",
          "query": {
            "bool": {
              "should": [
                {
                  "match": {
                    "jobs.id": "1"
                  }
                },
                {
                  "match": {
                    "jobs.id": "2"
                  }
                }
              ]
            }
          },
          "score_mode": "sum"
        }
      },
      "script_score": {
        "script": {
          "source": "_score / params['_source']['jobs'].length"
        }
      }
    }
  }
}
  •  Tags:  
  • Related