I have a collection of football matches shaped this way:
{
_id: ObjectId("SomeUniqueMatchID"),
players: [
{ team: "Arsenal", name: "Saka", distanceRan: 8590},
{ team: "Arsenal", name: "Aubameyang", distanceRan: 9230}
// ...remaining players for both teams
],
timestamp: 129380193,
// other match info
}
I want to query for the average distance ran by Aubameyang and Saka when playing together, regardless of the team they are in, as long as they are playing in the same.
I understand as far $unwind
on the players field, then $match
and finally using $group
to calculate the average, but I have no clue what to write on the $match condition when I don't know the team before running the query.
CodePudding user response:
Query1
- unwind players
- match to keep only the 2 players names
- group by
match_id
team
and push the distances in an array - match and keep only arrays of size =2
(this is the key, size=2 only if those players played both in the same match and in the same team)
(if you see on the example code id=2,id=3 are ignored because those players never played both in the same match and same team) - uwnind distances (here we know that all those are valid ones)
- group by
null
(all collection 1 group) and average the distance (you don't care about what match, what team or which player, so only information about the avgDistance is kept)
aggregate(
[{"$unwind": {"path": "$players"}},
{"$match":
{"$expr":
{"$or":
[{"$eq": ["$players.name", "Saka"]},
{"$eq": ["$players.name", "Aubameyang"]}]}}},
{"$group":
{"_id": {"_id": "$_id", "team": "$players.team"},
"distances": {"$push": "$players.distanceRan"}}},
{"$match": {"$expr": {"$eq": [{"$size": "$distances"}, 2]}}},
{"$unwind": {"path": "$distances"}},
{"$group": {"_id": null, "avgDistanceRan": {"$avg": "$distances"}}},
{"$unset": ["_id"]}])
Query2
- local it uses reduce, and the fact that 1 player in 1 match, max 1 time and max in 1 team (query1 works even if this wasn't true)
- reduce to keep those 2 distances, we add on distance array only if team is the same, and the name is one of the two
- again keep only distance count=2 (to make sure both players in the same match on the same team)
- unwind and group by null,average like above
aggregate(
[{"$set":
{"players":
{"$reduce":
{"input": "$players",
"initialValue": {"team": null, "distances": []},
"in":
{"$cond":
[{"$and":
[{"$or":
[{"$eq": ["$$this.name", "Saka"]},
{"$eq": ["$$this.name", "Aubameyang"]}]},
{"$or":
[{"$eq": ["$$value.team", null]},
{"$eq": ["$$value.team", "$$this.team"]}]}]},
{"team": "$$this.team",
"distances":
{"$concatArrays":
["$$value.distances", ["$$this.distanceRan"]]}},
"$$value"]}}}}},
{"$match": {"$expr": {"$eq": [{"$size": "$players.distances"}, 2]}}},
{"$unwind": {"path": "$players.distances"}},
{"$group":
{"_id": null, "avgdistanceRan": {"$avg": "$players.distances"}}},
{"$unset": ["_id"]}])
CodePudding user response:
I believe I understand what you are looking for, but if this is wrong, please let me know.
You should be able to use the aggregation pipeline for this, as you stated, with $match, $avg, and $group.
You may check out a live demo of the query here.
Database
If we have the following database structure...
[
{
_id: ObjectId("123456789101819202122232"),
players: [
{
team: "Arsenal",
name: "Saka",
distanceRan: 8590
},
{
team: "Arsenal",
name: "Aubameyang",
distanceRan: 9230
}
],
timestamp: 129380193,
},
{
_id: ObjectId("123456789101819202999999"),
players: [
{
team: "Arsenal",
name: "Saka",
distanceRan: 7777
},
{
team: "NotArsenal",
name: "Aubameyang",
distanceRan: 9999
}
],
timestamp: 129399999,
}
]
Query
We can use the following query to calculate the average.
db.collection.aggregate([
{
$unwind: "$players"
},
{
$match: {
$or: [
{
"players.name": "Saka",
"players.team": "Arsenal"
},
{
"players.name": "Aubameyang",
"players.team": "Arsenal"
}
]
}
},
{
$group: {
_id: "$_id",
averageDistanceRan: {
$avg: "$players.distanceRan"
},
"players": {
$push: "$players"
}
}
},
{
"$match": {
"$expr": {
"$eq": [
{
"$size": "$players"
},
2
]
}
}
},
{
$project: {
_id: "$_id",
averageDistanceRan: 1
}
}
])
Result
Which will give us...
[
{
"_id": ObjectId("123456789101819202122232"),
"averageDistanceRan": 8910
}
]