Is is possible to get the distinct values of field that is an array of strings and write the output of distinct to another collection. As shown below from src_coll to dst_coll.
src_coll
{"_id": ObjectId("61968a26c05149a23ad391f4"),"letters": ["aa", "ab", "ac", "ad", "aa", "af"] , "numbers":[11,12,13,14] }
{"_id": ObjectId("61968a26c05149a23ad391f5"),"letters": ["ab", "af", "ag", "ah", "ai", "aj"] , "numbers":[15,16,17,18] }
{"_id": ObjectId("61968a26c05149a23ad391f6"),"letters": ["ac", "ad", "ae", "af", "ag", "ah"] , "numbers":[16,17,18,19] }
{"_id": ObjectId("61968a26c05149a23ad391f7"),"letters": ["ae", "af", "ag", "ah", "ai", "aj"] , "numbers":[17,18,19,20] }
dst_coll
{"_id": ObjectId("61968a26c05149a23ad391f8"),"all_letters": ["aa", "ab", "ac", "ad", "ae", "af", "ag", "ah", "ai", "aj"] }
I have seen the answer using distinct:
db.src_coll.distinct('letters')
and using aggregate (if collection is huge, because i was getting error Executor error during distinct command :: caused by :: distinct too big, 16mb cap
). I used:
db.src_coll.aggregate([ { $group: { _id: "$letters" } }, { $count: "letters_count" }], { allowDiskUse: true })
I do no know how to write the output of distinct
or aggregate as show in dst_coll.
I appreciate your time to reply. Thanks
CodePudding user response:
Here's my solution for it but I'm not sure if it's the optimal way.
Algorithm:
- Unwind all the array
- Group by letters which will give only unique results
- Group them again to get a single result
Use the $out
stage to write the result to another collection:
Aggregation pipeline:
db.collection.aggregate([
{
$project: {
letters: 1,
_id: 0
}
},
{
$unwind: "$letters"
},
{
$group: {
_id: "$letters"
}
},
{
$group: {
_id: null,
allLetters: {
"$addToSet": "$_id"
}
}
},
{
$out: "your-collection-name"
}
])
Kindly see the docs for $out
stage yourself.
See the solution on mongodb playground: Query
CodePudding user response:
I am assuming you are trying to create a single document containing all the distinct values in letters field across all documents in src_col. You can create a collection based on aggregation output using either $out or $merge. But $out would replace your collection if it already exists.
The unwinding array here would run out of memory in which case you will have to use { allowDiskUse: true }
option.
db.collection.aggregate([
{
$unwind: "$letters"
},
{
$group: {
_id: null,
all_letters: {
"$addToSet": "$letters"
}
}
},
{
$merge: {
into: "dst_coll"
}
}
])