Windowing function in MongoDB-CodePudding

I have a collection that is made up of companies. Each company has a "number_of_employees" as well as a subdocument of "offices" which includes "state_code" and "country_code". For example:

{
    '_id': ObjectId('52cdef7c4bab8bd675297da5'),
    'name': 'Technorati',
    'number_of_employees': 35,
    'offices': [
        {'description': '',
        'address1': '360 Post St. Ste. 1100',
        'address2': '',
        'zip_code': '94108',
        'city': 'San Francisco',
        'state_code': 'CA',
        'country_code': 'USA',
        'latitude': 37.779558,
        'longitude': -122.393041}
    ]
}

I'm trying to get the number of employees per state across all companies. My latest attempt looks like:

db.research.aggregate([
    { "$match": {"offices.country_code": "USA" } },
    { "$unwind": "$offices" },
    { "$project": { "_id": 1, "number_of_employees": 1, "offices.state_code": 1 } }
    ])

But now I'm stuck on how to do the $group. Because the num_of_employees is at the company level and not the office level I want to split them evenly across the offices. For example, if Technorati has 5 offices in 5 different states then each state would be allocated 7 employees.

In SQL I could do this easily enough using a windowed function to get average employees across offices by company and then summing those while grouping by state. I can't seem to find any clear examples of similar functionality in MongoDB though.

Note, this is for a school assignment, so the use of third-party libraries isn't feasible. Also, I'm hoping that this can all be done in a simple snippet of code, possibly even one call. I could certainly create new intermediate collections or do this in Python and process data there, but that's probably outside of the scope of the homework.

Anything to point me in the right direction would be greatly appreciated!

CodePudding user response：

You are actually on the right track. You just need to derive an extra field numOfEmpPerOffice by using $divide and $sum it when $group by state.

db.collection.aggregate([
  {
    "$match": {
      "offices.country_code": "USA"
    }
  },
  {
    "$addFields": {
      "numOfEmpPerOffice": {
        "$divide": [
          "$number_of_employees",
          {
            "$size": "$offices"
          }
        ]
      }
    }
  },
  {
    "$unwind": "$offices"
  },
  {
    $group: {
      _id: "$offices.state_code",
      totalEmp: {
        $sum: "$numOfEmpPerOffice"
      }
    }
  }
])

Here is the Mongo playground for your reference.