Home > database >  MongoDB select best matched document
MongoDB select best matched document

Time:10-18

I have a collection of documents like this:

[{
    "_id" : ObjectId("6347e5aa0c009a37b81da700"),
    "testField1" : "1000",
    "testField2" : "2000",
    "testField3" : NumberInt(1)
},
{
    "_id" : ObjectId("6347e5890c009a37b81da701"),
    "testField2" : 2000,
    "testField3" : NumberInt(2)
},
{
    "_id" : ObjectId("6347e5960c009a37b81da702"),
    "testField3" : NumberInt(3)
}]

I need to retrieve documents in the below precedence.

  1. if testField1 and testField2 exist and match their values, the query should return that document.
  2. Otherwise, if testField2 exists and matches its value, the query should return that document,
  3. Otherwise it should return the last document, where testField1 & testField2 do not exist.

I tried the below query, but it returns all the documents.

db.getCollection("TEST_COLLECTION").aggregate([
    {
        $match: {
            $expr: {
                $cond: {
                    if: {
                        $and: {"testField1": "1000", "testField2": "2000"}
                    },
                    then: {
                        $and: {"testField1": "1000", "testField2": "2000"}
                    },
                    else : {
                        $cond: {
                            if: {
                                $and: {"testField1": null, "testField2": "2000"}
                            },
                            then: {
                                $and: {"testField1": null, "testField2": "2000"}
                            },
                            else : {
                                $and: {"testField1": null, "testField2": null}
                            }
                        }
                    }
                }
            }
        }
    }
])

CodePudding user response:

There are definitely still some open questions from the comments. @ray has an interesting approach linked in there that uses $setWindowFields which may be appropriate depending on exactly what you're looking for.

I took a different approach (and perhaps interpretation) and built out the following aggregation that uses $unionWith:

db.collection.aggregate([
  {
    $match: {
      testField1: "1000",
      testField2: "2000"
    }
  },
  {
    "$addFields": {
      sortOrder: 1
    }
  },
  {
    "$unionWith": {
      "coll": "collection",
      "pipeline": [
        {
          $match: {
            testField2: "2000"
          }
        },
        {
          "$addFields": {
            sortOrder: 2
          }
        }
      ]
    }
  },
  {
    "$unionWith": {
      "coll": "collection",
      "pipeline": [
        {
          $match: {
            testField1: {
              $exists: false
            },
            testField2: {
              $exists: false
            }
          }
        },
        {
          "$addFields": {
            sortOrder: 3
          }
        },
        
      ]
    }
  },
  {
    $sort: {
      sortOrder: 1
    }
  },
  {
    $limit: 1
  },
  {
    "$unset": "sortOrder"
  }
])

Basically the aggregation will internally issue three queries, one corresponding with each of three precedence conditions. Similar to @ray's solution, it creates a field to sort on (sortOrder in mine) since the ordering of $unionWith is unspecified otherwise per the documentation. After the $sort we can $limit to a single result and $unset the temporary sorting field prior to returning the result to the client. Depending on the version you are running, you could consider adding a couple of inline $limits for each of the subpipelines to reduce the amount of work being done. Along with appropriate indexes (perhaps just { testField2: 1, testField: 1 }), this operation should be reasonably efficient.

Here is the playground link.

CodePudding user response:

If there are several groups and you need to return the wanted document per group, I would go with @ray's answer. If there is only one group (as implies on your comment, and on @user20042973's nice answer), I would like to point another obvious option:

db.collection.aggregate([
  {$facet: {
      op1: [{$match: {testField1: "1000", testField2: "2000"}}],
      op2: [{$match: {testField1: null, testField2: "2000"}}],
      op3: [{$match: {testField1: null, testField2: null}},
            {$sort: {timestamp: -1}}, {$limit: 1}]
  }},
  {$project: {res: {$ifNull: [{$first: "$op1"}, {$first: "$op2"}, {$first: "$op3"}]}}},
  {$replaceRoot: {newRoot: "$res"}}
])

See how it works on the playground example

  • Related