Home > Software engineering >  MongoDB pipeline lookup with group losing fields
MongoDB pipeline lookup with group losing fields

Time:05-09

I only have 2 years exp with SQL databases and 0 with noSQL database. I am trying to write a pipeline using mongoDB compass aggregate pipeline tool that performs a look up, group, sum, and sort. I am using MongoDB compass to try and accomplish this. Also, please share any resources that makes learning this easier, I've not had much like finding good and easy to understand examples online with using compass to accomplish these tasks. Thank you.

Example question I am trying to solve is:

What customer placed the highest number of orders?

Example Data is:

Customer Collection:

{ "_id": { "$oid": "6276ba2dd1dfd6f5bf4b4f53" },  
  "Id": "1",  
  "FirstName": "Maria",  
  "LastName": "Anders",  
  "City": "Berlin",  
  "Country": "Germany",  
  "Phone": "030-0074321"}, 
{ "_id": { "$oid": "6276ba2dd1dfd6f5bf4b4f54" },  
  "Id": "2",  
  "FirstName": "Ana",  
  "LastName": "Trujillo",  
  "City": "México D.F.",  
  "Country": "Mexico",  
  "Phone": "(5) 555-4729" }

Order Collection:

{ "_id": { "$oid": "6276ba9dd1dfd6f5bf4b501f" },  
  "Id": "1",  
  "OrderDate": "2012-07-04 00:00:00.000",  
  "OrderNumber": "542378",  
  "CustomerId": "85",  
  "TotalAmount": "440.00" },
{ "_id": { "$oid": "6276ba9dd1dfd6f5bf4b5020" },  
  "Id": "2",  
  "OrderDate": "2012-07-05 00:00:00.000",  
  "OrderNumber": "542379",  
  "CustomerId": "79",  
  "TotalAmount": "1863.40" }

I have spend all day looking at youtube videos and MongoDB documentation but I am failing to comprehend a few things. One, at the time I do a $group function I lose all the fields not associated with the group and I would like to keep a few fields. I would like to have it return the name of the customer with the highest order.

The pipeline I was using that gets me part of the way is the following:

[{
   $lookup: {
      from: 'Customer',
      localField: 'CustomerId',
      foreignField: 'Id',
      as: 'CustomerInfo'
 }}, {
   $project: {
      CustomerId: 1,
      CustomerInfo: 1
 }}, {
   $group: {
      _id: '$CustomerInfo.Id',
      CustomerOrderNumber: {
   $sum: 1
  }
 }}, {
   $sort: {
      CustomerOrderNumber: -1
 }}]

Example data this returns in order: enter image description here

enter image description here

enter image description here

Apologies for the bad formatting, still trying to get the hang of posting questions that are easy to understand and useful.

CodePudding user response:

In $group stage, it only returns documents with _id and CustomerOrderNumber fields, so CustomerInfo field was missing.

  1. $lookup
  2. $project - From 1st stage, CustomerInfo returns as an array, hence getting the first document as a document field instead of an array field.
  3. $group - Group by CustomerId, sum the documents as CustomerOrderNumber, and take the first document as CustomerInfo.
  4. $project - Decorate the output documents.
  5. $setWindowsFields - With $denseRank to rank the document position by CustomerOrderNumber (DESC). If there are documents with same CustomerOrderNumber, the ranking will treat them as same rank/position.
  6. $match - Select documents with denseRankHighestOrder is 1 (highest).
db.Order.aggregate([
  {
    $lookup: {
      from: "Customer",
      localField: "CustomerId",
      foreignField: "Id",
      as: "CustomerInfo"
    }
  },
  {
    $project: {
      CustomerId: 1,
      CustomerInfo: {
        $first: "$CustomerInfo"
      }
    }
  },
  {
    $group: {
      _id: "$CustomerInfo.Id",
      CustomerOrderNumber: {
        $sum: 1
      },
      CustomerInfo: {
        $first: "$CustomerInfo"
      }
    }
  },
  {
    $project: {
      _id: 0,
      CustomerId: "$_id",
      CustomerOrderNumber: 1,
      CustomerName: {
        $concat: [
          "$CustomerInfo.FirstName",
          " ",
          "$CustomerInfo.LastName"
        ]
      }
    }
  },
  {
    $setWindowFields: {
      sortBy: {
        CustomerOrderNumber: -1
      },
      output: {
        denseRankHighestOrder: {
          $denseRank: {}
        }
      }
    }
  },
  {
    $match: {
      denseRankHighestOrder: 1
    }
  }
])

Sample Mongo Playground


Note:

$sort stage able to sort the document by CustomerOrderNumber. But if you try to limit the documents such as "SELECT TOP n", the output result may be incorrect when there are multiple documents with the same CustomerOrderNumber/rank.

Example: SELECT TOP 1 Customer who has the highest CustomerOrderNumber but there are 3 customers who have the highest CustomerOrderNumber.

  • Related