Home > Net >  Find the distinct values in array of field, count them and write to another collection as array of s
Find the distinct values in array of field, count them and write to another collection as array of s

Time:11-20

Is is possible to get the distinct values of field that is an array of strings and write the output of distinct to another collection. As shown below from src_coll to dst_coll.

src_coll

{"_id": ObjectId("61968a26c05149a23ad391f4"),"letters": ["aa", "ab", "ac", "ad", "aa", "af"] , "numbers":[11,12,13,14] }
{"_id": ObjectId("61968a26c05149a23ad391f5"),"letters": ["ab", "af", "ag", "ah", "ai", "aj"] , "numbers":[15,16,17,18] }
{"_id": ObjectId("61968a26c05149a23ad391f6"),"letters": ["ac", "ad", "ae", "af", "ag", "ah"] , "numbers":[16,17,18,19] }
{"_id": ObjectId("61968a26c05149a23ad391f7"),"letters": ["ae", "af", "ag", "ah", "ai", "aj"] , "numbers":[17,18,19,20] }

dst_coll

{"_id": ObjectId("61968a26c05149a23ad391f8"),"all_letters": ["aa", "ab", "ac", "ad", "ae", "af", "ag", "ah", "ai", "aj"] }

I have seen the answer using distinct:

db.src_coll.distinct('letters') and using aggregate (if collection is huge, because i was getting error Executor error during distinct command :: caused by :: distinct too big, 16mb cap). I used:

db.src_coll.aggregate([ { $group: { _id: "$letters" } }, { $count: "letters_count" }], { allowDiskUse: true })
I do no know how to write the output of distinct or aggregate as show in dst_coll.

I appreciate your time to reply. Thanks

CodePudding user response:

Here's my solution for it but I'm not sure if it's the optimal way.

Algorithm:

  1. Unwind all the array
  2. Group by letters which will give only unique results
  3. Group them again to get a single result

Use the $out stage to write the result to another collection:

Aggregation pipeline:

db.collection.aggregate([
  {
    $project: {
      letters: 1,
      _id: 0
    }
  },
  {
    $unwind: "$letters"
  },
  {
    $group: {
      _id: "$letters"
    }
  },
  {
    $group: {
      _id: null,
      allLetters: {
        "$addToSet": "$_id"
      }
    }
  },
  {
    $out: "your-collection-name"
  }
])

Kindly see the docs for $out stage yourself.

See the solution on mongodb playground: Query

CodePudding user response:

I am assuming you are trying to create a single document containing all the distinct values in letters field across all documents in src_col. You can create a collection based on aggregation output using either $out or $merge. But $out would replace your collection if it already exists. The unwinding array here would run out of memory in which case you will have to use { allowDiskUse: true } option.

db.collection.aggregate([
  {
    $unwind: "$letters"
  },
  {
    $group: {
      _id: null,
      all_letters: {
        "$addToSet": "$letters"
      }
    }
  },
  {
    $merge: {
      into: "dst_coll"
    }
  }
])

Demo at mongoplayground

  • Related