Home > Enterprise >  ElasticSearch - Combining Queries for 4 seperate randomly sourted groups?
ElasticSearch - Combining Queries for 4 seperate randomly sourted groups?

Time:11-25

I'm fairly new to elasticsearch (though with a fair bit of SQL experience) and am currently struggling with putting a proper query together. I have 2 boolean fields isPlayer and isEvil that an entry is either true or false on. Based on that, I want to split my dataset into 4 groups:

  1. isPlayer: true, isEvil: true
  2. isPlayer: true, isEvil: false
  3. isPlayer: false, isEvil: true
  4. isPlayer: false, isEvil: false

These groups I want to randomly sort within themselves, then attach them to be one long list that I can paginate. I'd like to do that inside the query, as that seems like the "correct" way to do this, since I'd do it similarly in SQL. In that list, the groups are to be sorted in order, so first all entries of Group 1 in a random order, then all entries of Group 2 in a random order, then all entries of Group 3 etc. . It is necessary that the randomness of the sorting is reproducible if given the same inputs, so if the sorting is based on random_score ideally I'd be using a seed for the randomness.

I can build a single query, but how do I combine 4?

As approaches I've found so far MultiSearch and Disjunction Max Query. MultiSearch seems like it doesn't support Pagination. Regarding Disjunction Max Query it might be that I'm missing the forest for the trees, but there I'm struggling in having the subqueries be randomly sorted only within themselves before appending them to one another.

Here how I write a single query for now without Disjunction Max Query, in case it helps:

{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "isPlayer": true
          }
        },
        {
          "term": {
            "isEvil": true
          }
        }
      ]
    }
  }
}

CodePudding user response:

The solution to this problem is not doing 4 separate groups, but instead ensuring they all have different ranges of scores and sorting by scores. This can be achieved, by scoring the hits not by some kind of matching criteria, but through a script-score field. This field allows you to write code yourself that returns a logic score (The default language is called "painless", but I've seen examples of groovy as well).

The logic is fairly simple:

  1. If isPlayer = true, add 2 points to the score
  2. If isEvil = true, add 4 points to the score
  3. Either way, add a random number between 0 and 1 to the score at the end

This creates the 4 groups I wanted with distinct score-ranges:

  1. isPlayer = true, isEvil = true --> Score-range: 6-7
  2. isPlayer = false, isEvil = true --> Score-range: 4-5
  3. isPlayer = true, isEvil = false --> Score-range: 2-3
  4. isPlayer = false, isEvil = false --> Score-range: 0-1

The query would look like this:

  "query": {
    "script_score": {
      "query": {
        "match_all": {}
      },
      "script": {
        "source": """
            double score = 0;
            if(doc['isPlayer']){
              score  = 2;
            }
            
            if(doc['isEvil']){
              score  = 4;
            }
            
            int partialSeed = 1;
            score  = randomScore(partialSeed, 'id');
            return score;
        """
      }
    }
  }
}
  • Related