Home > OS >  MongoDB Query doesn't return with a sort
MongoDB Query doesn't return with a sort

Time:09-25

I have the query:

db.changes.find(
  {
    $or: [
      { _id: ObjectId("60b1e8dc9d0359001bb80441") },
      { _oid: ObjectId("60b1e8dc9d0359001bb80441") },
    ],
  },
  {
    _id: 1,
  }
);

which returns almost instantly.

But the moment I add a sort, the query doesn't return. The query just runs. The longest I could tolerate the query running was over 30 Min, so I'm not entirely sure if it does eventually return.

db.changes
  .find(
    {
      $or: [
        { _id: ObjectId("60b1e8dc9d0359001bb80441") },
        { _oid: ObjectId("60b1e8dc9d0359001bb80441") },
      ],
    },
    {
      _id: 1,
    }
  )
  .sort({ _id: -1 });

I have the following indexes:

[
    {
        "_oid" : 1
    },
    {
        "_id" : 1
    }
]

and this is what db.currentOp() returns:

{
    "host": "xxxx:27017",
    "desc": "conn387",
    "connectionId": 387,
    "client": "xxxx:55802",
    "appName": "MongoDB Shell",
    "clientMetadata": {
        "application": {
            "name": "MongoDB Shell"
        },
        "driver": {
            "name": "MongoDB Internal Client",
            "version": "4.0.5-18-g7e327a9017"
        },
        "os": {
            "type": "Linux",
            "name": "Ubuntu",
            "architecture": "x86_64",
            "version": "20.04"
        }
    },
    "active": true,
    "currentOpTime": "2021-09-24T15:26:54.286 0200",
    "opid": 71111,
    "secs_running": NumberLong(23),
    "microsecs_running": NumberLong(23860504),
    "op": "query",
    "ns": "myDB.changes",
    "command": {
        "find": "changes",
        "filter": {
            "$or": [
                {
                    "_id": ObjectId("60b1e8dc9d0359001bb80441")
                },
                {
                    "_oid": ObjectId("60b1e8dc9d0359001bb80441")
                }
            ]
        },
        "sort": {
            "_id": -1.0
        },
        "projection": {
            "_id": 1.0
        },
        "lsid": {
            "id": UUID("38c4c09b-d740-4e44-a5a5-b17e0e04f776")
        },
        "$readPreference": {
            "mode": "secondaryPreferred"
        },
        "$db": "myDB"
    },
    "numYields": 1346,
    "locks": {
        "Global": "r",
        "Database": "r",
        "Collection": "r"
    },
    "waitingForLock": false,
    "lockStats": {
        "Global": {
            "acquireCount": {
                "r": NumberLong(2694)
            }
        },
        "Database": {
            "acquireCount": {
                "r": NumberLong(1347)
            }
        },
        "Collection": {
            "acquireCount": {
                "r": NumberLong(1347)
            }
        }
    }
}

This wasn't always a problem, it's only recently started. I've also rebuilt the indexes, and nothing seems to work. I've tried using .explain(), and that also doesn't return.

Any suggestions would be welcome. For my situation, it's going to be much easier to make changes to the DB than it is to change the query.

CodePudding user response:

This is happening due to the way Mongo chooses what's called a "winning plan", I recommend you read more on this in my other answer which explains this behavior. However it is interesting to see if the Mongo team will consider this specific behavior a feature or a bug.

Basically the $or operator has some special qualities, as specified:

When evaluating the clauses in the $or expression, MongoDB either performs a collection scan or, if all the clauses are supported by indexes, MongoDB performs index scans. That is, for MongoDB to use indexes to evaluate an $or expression, all the clauses in the $or expression must be supported by indexes. Otherwise, MongoDB will perform a collection scan.

It seems that the addition of the sort is disrupting the usage this quality, meaning you're running a collection scan all of a sudden.

What I recommend you do is use the aggregation pipeline instead of the query language, I personally find it has more stable behavior and it might work there. If not maybe just do the sorting in code ..

CodePudding user response:

The server can use a separate index for each branch of the $or, but in order to avoid doing an in-memory sort the indexes used would have to find the documents in the sort order so a merge-sort can be used instead.

For this query, an index on {_id:1} would find documents matching the first branch, and return them in the proper order. For the second branch, and index on {oid:1, _id:1} would do the same.

If you have both of those indexes, the server should be able to find the matching documents quickly, and return them without needing to perform an explicit sort.

  • Related