How to retrieve whole data from $$ROOT after using unwind in MongoDB aggregation-CodePudding

I am trying to group data using mongodb aggregation method. I have no idea about the DB structure but email column will be there. In DB email column can be saved with any name like gmail, mail, etc. For example test DB is shown below. How to group by email column if we don't know the name present in DB

    {
      "name": "stuv",
      "email": "[email protected]",
      "phone": "12345678990" 
    },
    {
     
      "name": "pqr",
      "email": "[email protected]",
      "phone": "9876543210"
     
    },
    {
      "name": "xyz",
      "email": "[email protected]",
      "phone": "2345678901"
    
    },
    {
      "name": "abc",
      "email": "[email protected]",
      "phone": "4567890123"
      
    }

From the above DB, I want to group data by unknown column name email so I am unwinding data and matching value with email regex and grouping the data and pushing root data to dups as shown below.


  db.test.aggregate([
    
  {$project: {_id: 0}},
 
  {$project: {data: { $objectToArray: "$$ROOT"}}},

 { $unwind: "$data"},

  {$group: {_id: "$data.v", 
  
      dups:{ $push: "$data"},
      count: { "$sum": 1 }   

    }},
    
    { $match: { "_id": /^\w ([\.-]?\w )*@\w ([\.-]?\w )*(\.\w{2,3}) $/ }  },

])

From the above aggregate function after using unwind I am not getting all fields(name, phone, email) inside dups instead I am getting only grouped field i.e, email column in the form of key and value separately as shown in the below output. I want output like email:[email protected] along with all other fields as shown in expected output result.

MY OUTPUT:

 "result": [
        {
            "_id": "[email protected]",
            "dups": [
                {
                    "k": "email",
                    "v": "[email protected]"
                },
                {
                    "k": "email",
                    "v": "[email protected]"
                }
            ],
            "count": 2
        },
        {
            "_id": "[email protected]",
            "dups": [
                {
                    "k": "email",
                    "v": "[email protected]"
                },
                {
                    "k": "email",
                    "v": "[email protected]"
                }
            ],
            "count": 2
        }
    ]

EXPECTED OUTPUT

"result": [
        {
            "_id": {
                "email": "[email protected]"
            },
            "dups": [
                {
                    "_id": "62af2c14e50225b659ab68a4",
                    "name": "pqr",
                    "email": "[email protected]",
                    "phone": "9876543210"
                },
                {
                    "_id": "62af2c14e50225b659ab68a6",
                    "name": "abc",
                    "email": "[email protected]",
                    "phone": "4567890123"
                }
            ],
            "count": 2
        },
        {
            "_id": {
                "email": "[email protected]"
            },
            "dups": [
                {
                    "_id": "62af2c14e50225b659ab68a3",
                    "name": "stuv",
                    "email": "[email protected]",
                    "phone": "12345678990"
                },
                {
                    "_id": "62af2c14e50225b659ab68a5",
                    "name": "xyz",
                    "email": "[email protected]",
                    "phone": "2345678901"
                }
            ],
            "count": 2
        }
    ]

Kindly help me in resolving this issue.

CodePudding user response：

EDIT: One option is to keep the current data and to it the array of keys and values, here as mailData. Then we can extract the email value using $regexMatch and $reduce to use it at the $group step as the _id

db.collection.aggregate([
  {$project: {mailData: {$objectToArray: "$$ROOT"}, data: "$$ROOT"}},
  {
    $set: {
      mailData: {
        $first: {
          $reduce: {
            input: "$mailData",
            initialValue: [],
            in: {
              $concatArrays: [
                "$$value",
                {
                  $cond: [
                    {
                      $regexMatch: {
                        input: {$toString: "$$this.v"},
                        regex: /^\w (?:[.-]\w )*@\w (?:[.-]\w )*\.\w{2,3}$/
                      }
                    },
                    ["$$this.v"],
                    []
                  ]
                }
              ]
            }
          }
        }
      }
    }
  },
  {
    $group: {
      _id: "$mailData",
      dups: {$push: "$data"},
      count: {"$sum": 1}
    }
  }
])

See how it works on the playground example

Another option is similar to your attempt, but using $regexMatch and group afterwards:

db.collection.aggregate([
  {$project: {mailData: {$objectToArray: "$$ROOT"}, data: "$$ROOT"}},
  {$unwind: "$mailData"},
  {
    $match: {
      $expr: {
        $eq: [
          {$regexMatch: {
              input: {$toString: "$mailData.v"},
              regex: /^\w (?:[.-]\w )*@\w (?:[.-]\w )*\.\w{2,3}$/
            }
          }, true
        ]
      }
    }
  },
  {
    $group: {
      _id: "$mailData.v",
      dups: {$push: "$data"},
      count: {"$sum": 1}
    }
  }
])

See how it works on the playground example - unwind