How to get unique/distinct count from $reduce within $expr of a find query?-CodePudding

In my collection, I have documents which contains an array (called values) of objects, which has id and val fields. The data looks like this

[
  {
    values: [
      {
        "id": "123",
        "val": true
      },
      {
        "id": "456",
        "val": true
      },
      {
        "id": "789",
        "val": false
      },
      
    ]
  },
  {
    values: [
      {
        "id": "123",
        "val": true
      },
      {
        "id": "123",
        "val": true
      },
      {
        "id": "123",
        "val": false
      },
      
    ]
  },
  {
    values: [
      {
        "id": "234",
        "val": false
      },
      {
        "id": "567",
        "val": false
      }
    ]
  }
]

I want to query this data by val to ensure it is true, and there may be cases where I want to ensure that an array has 2 instances of val's that is true. I am able to achieve this with the following query:

db.collection.find({
  values: {
    "$elemMatch": {
      val: true
    }
  },
  $expr: {
    $gte: [
      {
        $reduce: {
          input: "$values",
          initialValue: 0,
          in: {
            $sum: [
              "$$value",
              {
                $cond: [
                  {
                    $eq: [
                      "$$this.val",
                      true
                    ]
                  },
                  1,
                  0
                ]
              }
            ]
          }
        }
      },
      2
    ]
  }
})

The results of the query above give me the following result:

[
  {
    "values": [
      {
        "id": "123",
        "val": true
      },
      {
        "id": "456",
        "val": true
      },
      {
        "id": "789",
        "val": false
      }
    ]
  },
  {
    "values": [
      {
        "id": "123",
        "val": true
      },
      {
        "id": "123",
        "val": true
      },
      {
        "id": "123",
        "val": false
      }
    ]
  }
]

The issue with this now, is that I want to ensure that there are no duplicate id's in the results of the $reduce'd list. I have researched the docs and $group looks promising, however I am unsure on how to integrate this into the query. I want the result to look like the following:

[
  {
    "values": [
      {
        "id": "123",
        "val": true
      },
      {
        "id": "456",
        "val": true
      },
      {
        "id": "789",
        "val": false
      }
    ]
  }
]

Here is a MongoPlayground link with all of the above in it.

Please Note

I have simplified the code examples here to get to the core of the problem. In my actual use case, there are many values of different data types, which means that using an $elemMatch is the best way to go for me.

CodePudding user response：

Query

reduce is fine, but maybe 2 filters are simpler here
first filter to have true count >=2
second filter to not have duplicate id, by comparing the values.id length, with the length of the set values.id (keep it only if same size else => duplicate), it works using paths on arrays = arrays like values.id is array of ids

*if this is slow for your use case maybe it can be faster

Playmongo

aggregate(
[{"$match": 
   {"$expr": 
     {"$and": 
       [{"$gte": 
           [{"$size": {"$filter": {"input": "$values.val", "cond": "$$this"}}},
            2]},
         {"$eq": 
           [{"$size": "$values.id"},
             {"$size": {"$setUnion": ["$values.id", []]}}]}]}}}])

CodePudding user response：

After looking at Takis's answer, it made me realise that using $setUnion is an alternative way to find the distinct values of a field. Using this, I was able to rework my query to achieve what I want.

What I have done to achieve this is to have a $cond operator within the $expr operator. I pass in the original $reduce as the condition to see if x amount of val's appear within the document. If it succeeds, then I am ensuring that the size of the union of values.id's (i.e. all the unique id's within the current document values array) is at least x amount. If this condition is satisfied, the document will be returned. If not, then it falls back to the value false, i.e. it will not return the current document.

The query is as follows:

db.collection.find({
  values: {
    "$elemMatch": {
      val: true
    }
  },
  $expr: {
    $cond: [
      {
        $gte: [
          {
            $reduce: {
              input: "$values",
              initialValue: 0,
              in: {
                $sum: [
                  "$$value",
                  {
                    $cond: [
                      {
                        $eq: [
                          "$$this.val",
                          true
                        ]
                      },
                      1,
                      0
                    ]
                  }
                ]
              }
            }
          },
          2 // x amount of instances
        ]
      },
      {
        $gte: [
          {
            "$size": {
              $setUnion: [
                "$values.id"
              ]
            }
          },
          2 // x amount of instances
        ]
      },
      false
    ]
  }
})

Here is a MongoPlayground link showing it in action.