Home > database >  Mongo: Find documents that have 0 associated documents, less expensive
Mongo: Find documents that have 0 associated documents, less expensive

Time:10-25

I have Orders and Shops.

db={
  "orders": [
    {
      "_id": 1,
      "shop": 1,
      "price": 11
    },
    {
      "_id": 2,
      "shop": 2,
      "price": 101
    },
    
  ],
  "shops": [
    {
      "_id": 1,
      
    },
    {
      "_id": 2,
      
    },
    {
      "_id": 3,
      
    },
    
  ],
  
}

I want to find which shops have 0 orders.

I did it like so

db.shops.aggregate([
  {
    $lookup: {
      from: "orders",
      let: {
        shop: "$_id"
      },
      pipeline: [
        {
          $match: {
            $expr: {
              $eq: [
                "$shop",
                "$$shop"
              ]
            },
            
          },
          
        },
        
      ],
      as: "orders",
      
    },
    
  },
  {
    $project: {
      user: "$user",
      orderCnt: {
        $size: "$orders"
      }
    }
  },
  {
    $match: {
      orderCnt: {
        $eq: 0
      }
    }
  },
  
])

This works. I've tried it several other ways too. But it's always very expensive!

How can I more efficiently find shops that have 0 orders? With 1000's of orders and 1000's of shops this takes a really long time.

mongo playgound

CodePudding user response:

As you mentioned with the current approach you're essentially using $lookup on every single document only to "not use" the results, there is clear (immense) overhead here and this could be well optimized.

I would choose one of the following two solutions:

  1. (The option I recommend less but requires no preprocessing) Create an index on the shop field in the order collection if it doesn't exist. Now first fetch all shop _ids from the order collection, then use it in a find query on the shop collection:
const shopIds = await mongo.getCollection('orders').distinct('shop');
const shopsWithNoOrders = await mongo.getCollection('shops').find({ _id: {$nin: shopIds }});

This approach will be much much faster than your current approach, however it does have some issues with it, The biggest issue with it is the $nin operator:

The inequality operator $nin is not very selective since it often matches a large portion of the index. As a result, in many cases, a $nin query with an index may perform no better than a $nin query that must scan all documents in a collection. See also Query Selectivity.

Basically performance could still be improved.

  1. My recommended solution that requires some preprocessing, This require changing the shop schema and adding a new field "activeOrders", essentially we will maintain a counter on each shop for how many orders it currently has.

this means that for each order insert and deletion you will need to update the shops active orders, this should not be too hard to maintain and is a very common access pattern however without knowing your app it's hard to guarantee this will be easy to implement.

Once you have the activeOrders field maintained you can build an index on it and just use a simple query:

const shopsWithNoOrders = await mongo.getCollection('shops').find({ activeOrders: 0 });
  • Related