I have a collection of documents in this format:
{
_id: ObjectId,
items: [
{
defindex: number,
...
},
...
]
}
Certain parts of the schema not relevant are omitted, and each item defindex within the items array is guaranteed to be unique for that array. The same defindex can occur in different documents' items
fields, but will only occur once in each respective array if present.
I currently call $unwind upon the items
field, followed by $sortByCount upon items.defindex
to get a sorted list of items with the highest count.
I now want to add a new field to this final sorted list using $set called usage
, that shows the item's usage as a percentage of the initial number of total documents in the collection.
(i.e. if the item's count
is 1300, and the overall document count pre-$unwind was 2600, the usage value will be 0.5)
My initial plan was to use $facet upon the initial collection, creating a document as so:
{
total: number (achieved using $count),
documents: [{...}] (achieved using an empty $set)
}
And then calling $unwind on the documents field to add the total document count to each document. Calculating the usage value is then trivial using $set, since the total count is a field in the document itself.
This approach ran into memory issues though, since my collection is far larger than the 16MB limit.
How would I solve this?
CodePudding user response:
One way to do it is use $setWindowFields
:
db.collection.aggregate([
{
$setWindowFields: {
output: {
totalCount: {$count: {}}
}
}
},
{
$unwind: "$items"
},
{
$group: {
_id: "$items.defindex",
count: {$sum: 1},
totalCount: {$first: "$totalCount"}
}
},
{
$project: {
count: 1,
usage: {$divide: ["$count", "$totalCount"]
}
}
},
{$sort: {count: -1}}
])
As you can see here