I am learning how to write MongoDB queries as part of my personal project. I think I am missing something small or big (I don't know), and hoping that someone could point to what is wrong with my MongoDB query. I am working on the popular zip code data set that is provided on MongoDB here: https://docs.mongodb.com/manual/tutorial/aggregation-zip-code-data-set/
An example of a document in the zipcodes collection has the following form:
{
"_id": "10280",
"city": "NEW YORK",
"state": "NY",
"pop": 5574,
"loc": [
-74.016323,
40.710537
]
}
Goal: I am trying to find the population of each city in the state of New York (NY) where the results list the postal codes that are included in each city. The output should look something like this:
{
_id : "ELMIRA",
population : <population of ELMIRA>,
postal_codes : [
<PostalCode1 in ELMIRE>,
<PostalCode2 in ELMIRE>,
<PostalCode3 in ELMIRA>
]
},
{
_id : "WHITESVILLE",
population : <population of WHITESVILLE>,
postal_codes : [
<PostalCode1 in WHITESVILLE>
]
},
My Code: Here is what I have written so far
db.getCollection('codes').aggregate([
{
$group: {
_id: {city: "$city", state:"$state"},
population: {$sum: "$pop"},
postal_codes: {$addToSet: "$_id"}
}
},
{
$match: {
"_id.state":"NY"
}
},
{
$group: {
_id: "$_id.city",
population: {$last: "$population"},
postal_codes: {$last: "$postal_codes"}
}
}
])
Problem - The issue that I see with my code is that it only lists one of the zip codes for a city in New York that has multiple zip codes, as shown below. In the U.S., a city may overlap several postal codes. How can I fix my query to include all zip codes for each city in New York instead of one of them?
{
"_id" : "OSSINING",
"population" : NumberInt(29926),
"postal_codes" : [
"10562"
]
}
{
"_id" : "COLD SPRING",
"population" : NumberInt(4904),
"postal_codes" : [
"10516"
]
}
{
"_id" : "NORTH WOODMERE",
"population" : NumberInt(19582),
"postal_codes" : [
"11581"
]
}
CodePudding user response:
You can simply use $addToset
instead of $last
to get all the postal_codes
.
One side note: you can make your code a bit more performant by reducing the intermediate result set through pushing the $match
by state
to earlier stage.
db.collection.aggregate([
{
$match: {
"state": "NY"
}
},
{
$group: {
_id: "$city",
population: {
$sum: "$pop"
},
postal_codes: {
$addToSet: "$_id"
}
}
}
])
Here is the Mongo playground for your reference.