Home > Software design >  MongoDB Aggregation pipeline: group array elements by a field without knowing the field value
MongoDB Aggregation pipeline: group array elements by a field without knowing the field value

Time:12-02

I have a collection of football matches shaped this way:

{
  _id: ObjectId("SomeUniqueMatchID"),
  players: [
    { team: "Arsenal", name: "Saka", distanceRan: 8590},
    { team: "Arsenal", name: "Aubameyang", distanceRan: 9230}
    // ...remaining players for both teams
  ],
  timestamp: 129380193,
  // other match info
}

I want to query for the average distance ran by Aubameyang and Saka when playing together, regardless of the team they are in, as long as they are playing in the same.

I understand as far $unwind on the players field, then $match and finally using $group to calculate the average, but I have no clue what to write on the $match condition when I don't know the team before running the query.

CodePudding user response:

Query1

  • unwind players
  • match to keep only the 2 players names
  • group by match_id team and push the distances in an array
  • match and keep only arrays of size =2 (this is the key, size=2 only if those players played both in the same match and in the same team)
    (if you see on the example code id=2,id=3 are ignored because those players never played both in the same match and same team)
  • uwnind distances (here we know that all those are valid ones)
  • group by null(all collection 1 group) and average the distance (you don't care about what match, what team or which player, so only information about the avgDistance is kept)

Test code here

aggregate(
[{"$unwind": {"path": "$players"}},
  {"$match": 
    {"$expr": 
      {"$or": 
        [{"$eq": ["$players.name", "Saka"]},
          {"$eq": ["$players.name", "Aubameyang"]}]}}},
  {"$group": 
    {"_id": {"_id": "$_id", "team": "$players.team"},
      "distances": {"$push": "$players.distanceRan"}}},
  {"$match": {"$expr": {"$eq": [{"$size": "$distances"}, 2]}}},
  {"$unwind": {"path": "$distances"}},
  {"$group": {"_id": null, "avgDistanceRan": {"$avg": "$distances"}}},
  {"$unset": ["_id"]}])

Query2

  • local it uses reduce, and the fact that 1 player in 1 match, max 1 time and max in 1 team (query1 works even if this wasn't true)
  • reduce to keep those 2 distances, we add on distance array only if team is the same, and the name is one of the two
  • again keep only distance count=2 (to make sure both players in the same match on the same team)
  • unwind and group by null,average like above

Test code here

aggregate(
[{"$set": 
    {"players": 
      {"$reduce": 
        {"input": "$players",
          "initialValue": {"team": null, "distances": []},
          "in": 
          {"$cond": 
            [{"$and": 
                [{"$or": 
                    [{"$eq": ["$$this.name", "Saka"]},
                      {"$eq": ["$$this.name", "Aubameyang"]}]},
                  {"$or": 
                    [{"$eq": ["$$value.team", null]},
                      {"$eq": ["$$value.team", "$$this.team"]}]}]},
              {"team": "$$this.team",
                "distances": 
                {"$concatArrays": 
                  ["$$value.distances", ["$$this.distanceRan"]]}},
              "$$value"]}}}}},
  {"$match": {"$expr": {"$eq": [{"$size": "$players.distances"}, 2]}}},
  {"$unwind": {"path": "$players.distances"}},
  {"$group": 
    {"_id": null, "avgdistanceRan": {"$avg": "$players.distances"}}},
  {"$unset": ["_id"]}])

CodePudding user response:

I believe I understand what you are looking for, but if this is wrong, please let me know.

You should be able to use the aggregation pipeline for this, as you stated, with $match, $avg, and $group.

You may check out a live demo of the query here.

Database

If we have the following database structure...

[
  {
    _id: ObjectId("123456789101819202122232"),
    players: [
      {
        team: "Arsenal",
        name: "Saka",
        distanceRan: 8590
      },
      {
        team: "Arsenal",
        name: "Aubameyang",
        distanceRan: 9230
      }
    ],
    timestamp: 129380193,
    
  },
  {
    _id: ObjectId("123456789101819202999999"),
    players: [
      {
        team: "Arsenal",
        name: "Saka",
        distanceRan: 7777
      },
      {
        team: "NotArsenal",
        name: "Aubameyang",
        distanceRan: 9999
      }
    ],
    timestamp: 129399999,
    
  }
]

Query

We can use the following query to calculate the average.

db.collection.aggregate([
  {
    $unwind: "$players"
  },
  {
    $match: {
      $or: [
        {
          "players.name": "Saka",
          "players.team": "Arsenal"
        },
        {
          "players.name": "Aubameyang",
          "players.team": "Arsenal"
        }
      ]
    }
  },
  {
    $group: {
      _id: "$_id",
      averageDistanceRan: {
        $avg: "$players.distanceRan"
      },
      "players": {
        $push: "$players"
      }
    }
  },
  {
    "$match": {
      "$expr": {
        "$eq": [
          {
            "$size": "$players"
          },
          2
        ]
      }
    }
  },
  {
    $project: {
      _id: "$_id",
      averageDistanceRan: 1
    }
  }
])

Result

Which will give us...

[
  {
    "_id": ObjectId("123456789101819202122232"),
    "averageDistanceRan": 8910
  }
]
  • Related