I have Order
s and Shop
s.
db={
"orders": [
{
"_id": 1,
"shop": 1,
"price": 11
},
{
"_id": 2,
"shop": 2,
"price": 101
},
],
"shops": [
{
"_id": 1,
},
{
"_id": 2,
},
{
"_id": 3,
},
],
}
I want to find which shops have 0 orders.
I did it like so
db.shops.aggregate([
{
$lookup: {
from: "orders",
let: {
shop: "$_id"
},
pipeline: [
{
$match: {
$expr: {
$eq: [
"$shop",
"$$shop"
]
},
},
},
],
as: "orders",
},
},
{
$project: {
user: "$user",
orderCnt: {
$size: "$orders"
}
}
},
{
$match: {
orderCnt: {
$eq: 0
}
}
},
])
This works. I've tried it several other ways too. But it's always very expensive!
How can I more efficiently find shops that have 0 orders? With 1000's of orders and 1000's of shops this takes a really long time.
CodePudding user response:
As you mentioned with the current approach you're essentially using $lookup
on every single document only to "not use" the results, there is clear (immense) overhead here and this could be well optimized.
I would choose one of the following two solutions:
- (The option I recommend less but requires no preprocessing) Create an index on the
shop
field in theorder
collection if it doesn't exist. Now first fetch allshop
_ids
from theorder
collection, then use it in afind
query on theshop
collection:
const shopIds = await mongo.getCollection('orders').distinct('shop');
const shopsWithNoOrders = await mongo.getCollection('shops').find({ _id: {$nin: shopIds }});
This approach will be much much faster than your current approach, however it does have some issues with it, The biggest issue with it is the $nin
operator:
The inequality operator $nin is not very selective since it often matches a large portion of the index. As a result, in many cases, a $nin query with an index may perform no better than a $nin query that must scan all documents in a collection. See also Query Selectivity.
Basically performance could still be improved.
- My recommended solution that requires some preprocessing, This require changing the shop schema and adding a new field "
activeOrders
", essentially we will maintain a counter on each shop for how many orders it currently has.
this means that for each order insert and deletion you will need to update the shops active orders, this should not be too hard to maintain and is a very common access pattern however without knowing your app it's hard to guarantee this will be easy to implement.
Once you have the activeOrders
field maintained you can build an index on it and just use a simple query:
const shopsWithNoOrders = await mongo.getCollection('shops').find({ activeOrders: 0 });