Currently, I have a lot of Documents in MongoDB. What I need to do is find every unique field, key, and value, and count the total amount that exist for each. As an example, if we had the documents:
[{ value1: 'same', value2: 'unique'}, { value1: 'same', value2: 'unique1' }, { value3: 'new' }]
The output should be (something along the lines of this - the exact format doesn't matter)
[
{
value1: {
same: 2, // there are 2 documents with { value1: 'same' }
},
value2: {
unique: 1, // there is 1 document with { value2: 'unique' }
unique1: 1, // there is 1 document with { value2: 'unique1' }
},
value3: {
new: 1, // there is 1 document with { value3: 'new' }
},
}
]
I'm not sure of a good, performant way to do this with an Aggregation (We could do an "unwind" of all properties, but that's going to create a new Document for each property, then loop over every property, which would be extremely bad for performance)
I have two methods that seem quite close to getting a general idea to work, but I have an issue with both.
- $function global variable.
If I can loop over every Document and use a global "output" variable, I can simply update this variable for each Document, do the work in JS, and then return the variable at the end.
The "proof of concept" I tried to make for this would be to simply increment a global variable, and then return this variable as the Document.
[{
// write a custom JS function that has access to the current document
$replaceRoot: {
newRoot: {
$function: {
body: function (doc: any, TOTAL_COUNT: any) {
if (!TOTAL_COUNT) TOTAL_COUNT = 0; // set the initial value
TOTAL_COUNT ; // this doesn't truly increment the "$TOTAL_COUNT" variable :(
return { total_count: TOTAL_COUNT };
},
args: ['$$ROOT', '$TOTAL_COUNT'],
lang: 'js',
},
},
},
},
]
- Accumulator
I'm quite certain the "correct" way to do this is with an Accumulator (as those are meant to store persistent information). However, the only methods available with an Accumulator are $group, $bucket, and $bucketAuto.
To the best of my knowledge, none of these allow me to simply "return an output", similar to $project or $replaceRoot.
I'd be happy to use either an Accumulator or a Function and do the rest myself once I have a proof of concept working :)
Thank you for any help/advice at all! It's greatly appreciated :)
CodePudding user response:
You can first convert the documents into an array of k-v tuples. $unwind
the array and $group
as needed.
db.collection.aggregate([
{
"$project": {
kv: {
"$objectToArray": "$$ROOT"
}
}
},
{
"$unwind": "$kv"
},
{
"$match": {
"kv.k": {
$ne: "_id"
}
}
},
{
$group: {
_id: {
k: "$kv.k",
v: "$kv.v"
},
cnt: {
$sum: 1
}
}
}
])
Here is the Mongo Playground for your reference.