Home > Software design >  Using `$unwind` in MongoDB doubts of different `$sum` outputs
Using `$unwind` in MongoDB doubts of different `$sum` outputs

Time:02-16

I´ve a doubt using $unwind...

First of all, this is my collection "Provincias.json" that stores info about Spanish regions (short example):

 {
  _id: ObjectId("62095151245a59e4bda99e9a"),
  Nombre: 'Soria',
  CA: 'Castilla y León',
  Superficie: 10306,
  Datos: [
   { Anyo: 2015, Valor: 91780 },
   { Anyo: 2014, Valor: 92630 },
   { Anyo: 2013, Valor: 93575 },
   { Anyo: 2012, Valor: 94463 },
   ..... ]

Where CA is the state name, Nombre the county name, Superficie county surface and Datos the population each year.

Well I´m trying to make a "query" that returns the info grouped by CA with total surface and the difference between max and min population densities.

My expression is as follows:

db.ProvinciasPEC1.aggregate([
   {$unwind: "$Datos"},
   {$group: {
       "_id": "$CA", 
       "totSuperficie": {$sum:"$Superficie"},
       "PoblacionMaxima": {$max: "$Datos.Valor"},
       "PoblacionMinima": {$min: "$Datos.Valor"}}}, 
        {$project: {
           "_id": 1,"totSuperficie": 1,
           "totPoblacion": 1,
           "diffDensidad": { 
               $divide: [ 
                 {$subtract: ["$PoblacionMaxima", "$PoblacionMinima"]} , "$totSuperficie"
               ] 
            }
         }
      },
      {$sort: {"diffDensidad":-1}},
      {$out:"CAM_Densidad"}
    ])

I returns me somethin like this:

enter image description here

But I don´t know why the total Surface is wrong.

Using the following expression I get States with proper total surface:

db.ProvinciasPEC1.aggregate([{$group: {"_id": "$CA", "totSuperficie": {$sum:"$Superficie"}}},{$out:"CAM_Densidad"}])

That returns:

enter image description here

Please, any hint to figure out what I´m doing wrong? I think is related to $unwind, but not sure,

Thanks a lot!

CodePudding user response:

Here's one way to do it. Determine the min/max Valor per document, and then $group by CA.

db.collection.aggregate([
  {
    "$set": {
      "maxV": {
        "$max": "$Datos.Valor"
      },
      "minV": {
        "$min": "$Datos.Valor"
      }
    }
  },
  {
    "$group": {
      "_id": "CA",
      "totMaxV": {
        "$max": "$maxV"
      },
      "totMinV": {
        "$min": "$minV"
      },
      "totSf": {
        "$sum": "$Superficie"
      }
    }
  },
  {
    "$project": {
      "diffDensidad": {
        "$divide": [
          {
            "$subtract": [
              "$totMaxV",
              "$totMinV"
            ]
          },
          "$totSf"
        ]
      },
      "CA": "$_id",
      "_id": 0
    }
  }
])

Try it on mongoplayground.net.

CodePudding user response:

Maybe something like this:

 db.collection.aggregate([
  {
   $unwind: "$Datos"
  },
  {
   $group: {
   "_id": "$CA",
   "totSuperficie": {
    $first: "$Superficie"
   },
   "PoblacionMaxima": {
    $max: "$Datos.Valor"
   },
   "PoblacionMinima": {
    $min: "$Datos.Valor"
   }
   }
  },
  {
   $project: {
    "_id": 1,
    "totSuperficie": 1,
    "totPoblacion": 1,
    "diffDensidad": {
      $divide: [
      {
        $subtract: [
          "$PoblacionMaxima",
          "$PoblacionMinima"
        ]
       },
      "$totSuperficie"
        ]
      }
    }
  },
  {
    $sort: {
     "diffDensidad": -1
   }
  },
  {
    $out: "CAM_Densidad"
  }
])

explained:

  1. Unwind the Datos
  2. Group the per CA (but preserve the Superficie since it is the same value for all unwinded Datos so if you sum them you will get different sum depending on number of Datos per CA )
  3. Project
  4. Sort by diff
  5. Out to the output collection

playground

  • Related