Why are "hello" (isMaster) commands with topologyVersion timing out?-CodePudding

To replicate what I’m seeing on Production, I have a standalone MongoDB instance running locally, where the hello() admin command (previously isMaster) runs till the maxAwaitTimeMS and times out, every time:

db.runCommand({
    "hello": 1,
    "topologyVersion": {
      "processId": ObjectId("628c762af984c948c15f8811"),
      "counter": NumberLong(0)
    },
    "maxAwaitTimeMS": 5000,
    "$db": "admin"
});;

Note that when 'timing out', the response is still what the client would have expected - what's not clear is why the Mongo server does not return with this immediately, rather than having the query have to run till its maxAwaitTimeMS before responding:

{
    "isWritablePrimary" : true,
    "topologyVersion" : {
        "processId" : ObjectId("628c762af984c948c15f8811"),
        "counter" : 0
    },
    "maxBsonObjectSize" : 16777216,
    "maxMessageSizeBytes" : 48000000,
    "maxWriteBatchSize" : 100000,
    "localTime" : ISODate("2022-05-29T11:35:36.011 01:00"),
    "logicalSessionTimeoutMinutes" : 30,
    "connectionId" : 86043,
    "minWireVersion" : 0,
    "maxWireVersion" : 9,
    "readOnly" : false,
    "ok" : 1
}

There is no other activity on the server - so it's not an issue of performance or connections otherwise being held up. The query will run as long as the ‘awaitMaxTimeMS’, even when increased to e.g 5 minutes. The issue only occurs where the topologyVersion is included in the call, specifically when using the correct processID that is returned when first running the query as follows:

// query completes immediately
db.runCommand({
  "hello": 1,
  "$db": "admin"
});

// and the response includes the following:
// { "processId" : ObjectId("628c762af984c948c15f8811") }

These calls are being made from Mongo drivers (PyMongo and Ruby Mongo), and we are seeing the same issue on a Production Mongo instance - where currentOp is showing many of these hello() queries, all of which are timing out.

My question is, why are these requests timing out and not completing immediately as expected?

Aside from the noise it creates in the currentOp list (with each query hanging around for the 10 seconds) and the curiosity in understanding why, I'd also like this resolved as to help ensure connections aren't otherwise being held up due to this issue. Reducing the maxAwaitTimeMS (assuming the Mongo drivers give such control) would feel like hiding the issue. The issue does not seem to be happening while performing a few tests on a replicated MongoDB server.

CodePudding user response：

what's not clear is why the Mongo server does not return with this immediately, rather than having the query have to run till its maxAwaitTimeMS before responding:

this is expected behavior for a "special" mode for a hello thread called streaming. In 2 words, a client just initiates streaming server's states by sending a hello command and then waits responses from server (without sending requests). So maxAwaitTimeMS is a max time which server can wait before sending response to the client.

So yes, it's expected. Waiting can be less than maxAwaitTimeMS, for example if server topology has been changed during waiting, but this value sets the top waiting bound and if the server is healthy, you will wait the whole specified period

I see, so you're saying it's normal behaivour to have many isMaster/hello queries all running up to the maxAwaitTimeMS

if it happens at the same time, then I would vote that they contain different endpoints in me field in the response since driver creates a separate hello thread for each endpoint