mongodb executionStats time is different than real execution-CodePudding

I am using M10 mongodb atlas (2GB RAM, 2CPU) for a test. I have 101330 documents in the collection. Each document looks like

{
  _id: ObjectId("618d3bf93de582a797f28df3"),
  applicantName: 'a57cgMwpX6EsMca',
  amount: 6391,
  status: 'Pending',
  date: ISODate("2021-11-11T15:51:21.115Z")
}

Using mongo shell, I run query db.test.find({}).toArray() to return all the records. It doesn't complete in 1 minute. I was surprised it is taking so long. Is it because I am using low resources or it is normal. Running explain command to see details with db.test.find({}).explain("executionStats") gave the result.

{
  queryPlanner: {
    plannerVersion: 1,
    namespace: 'policymatrix_dev.indextest',
    indexFilterSet: false,
    parsedQuery: {},
    winningPlan: { stage: 'COLLSCAN', direction: 'forward' },
    rejectedPlans: []
  },
  executionStats: {
    executionSuccess: true,
    nReturned: 101330,
    executionTimeMillis: 26,
    totalKeysExamined: 0,
    totalDocsExamined: 101330,
    executionStages: {
      stage: 'COLLSCAN',
      nReturned: 101330,
      executionTimeMillisEstimate: 2,
      works: 101332,
      advanced: 101330,
      needTime: 1,
      needYield: 0,
      saveState: 101,
      restoreState: 101,
      isEOF: 1,
      direction: 'forward',
      docsExamined: 101330
    }
  },
  serverInfo: {
    host: '',
    port: 27017,
    version: '4.4.10',
    gitVersion: ''
  },
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1636723868, i: 1 }),
    signature: {
      hash: 
      keyId: Long("6968661901790150660")
    }
  },
  operationTime: Timestamp({ t: 1636723868, i: 1 })
}

From the result, I can see it is giving executionTimeMillis equal 26 which is very small value. Now I am confused why there is a difference in the real query execution time and executionTimeMillis of stats. Are they the same thing or different. To add more, I run same query in the AWS lambda function. It completed in 5000 milliseconds. This is getting really confusing.

CodePudding user response：

When you run db.test.find({}).explain("executionStats") you get back a single document. The execution time will include time spent fetching from disk/cache, but these documents never actually leave the server.

When you run db.test.find({}).toArray() there is a lot going on behind the scene that you don't see.

The anatomy of that request goes something like:

driver sends "find" request to the server
mongod takes global, database, and collection locks
mongod creates a new cursor to retrieve the documents
As the documents are returned from the storage engine, mongod builds a response object
When the response object reaches the BSON object size limit of 16MB, the mongod releases the locks and sends the document to the driver
driver receives the response object containing multiple documents
driver iterates those returned documents, pushing each onto an array
when the response object is exhausted, sends a getMore request to the server
mongod re-acquires the locks and resumes the cursor
repeat steps 4-9 until the cursor is exhausted
return the final array to the calling process

If you have 100k documents with an average size of around 1k, that will need at least 6 round trips.

The time reported by the explain will include steps 2,3 and 4. If there are any changes in cache, memory, disk, or CPU utilization between the explain and the actual run, those steps will not take the same amount of time.

The actual execution also puts the server-side execution on hold for steps 5,6,7, and 8, so if the network or client-side array processing are anything less than instantaneous (i.e. in the real world), the real execution will take longer.

If you want to see the specifics, you could watch the network packets on the client and/or server to see exactly when each batch was requested and completed.

You might also adjust the profiler settings such as slowms so that you get a log entry for each batch that will detail locks used, and maybe time reading from disk.

CodePudding user response：

MongoDB is using the LRU(Least Recent First) algorithm to discard least recent data from memory especially when you have less resources. So what seems to happen is that data is located on storage and it is not used for a long time it will take some time for the VM to load it in memory and process it especially when you do not use indexes , so you run the query and it process the data for > 1 min time , but when you run immediately afterwards the explain("executionStats") the data is already in memory and estimation take less time to process it. If you restart the service or flush the caches and execute the explain("executionStats") before the actual query it will show you the > 1 min time( afcourse here it depends on the backend storage utilisation )

Also M10 and M20 cluster tiers support development environments and low-traffic applications , they are not intended for production and high performance activities.