Home > Net >  How do I summarize tags by category in mongodb
How do I summarize tags by category in mongodb

Time:05-25

I have a collection that is shaped like this:

[
    {
        _id: ObjectId("5d8e8c9b8f8b9b7b7a8b4567"),
        tags: {
            language: [ 'en' ],
            industries: [ 'agency', 'travel' ],
            countries: [ 'ca', 'us' ],
            regions: [ 'north-america' ],
        }
    },
    {
        _id: ObjectId("5d8e8c9b8f8b9b7b7a8b4568"),
        tags: {
            language: [ 'en', 'fr' ],
            industries: [ 'travel' ],
            countries: [ 'ca' ]
        }
    },
    {
        _id: ObjectId("5d8e8c9b8f8b9b7b7a8b4569"),
        tags: {
            language: [ 'en' ],
            industries: [ 'agency', 'travel' ],
            countries: [ 'ca', 'us' ],
            regions: [ 'south-america' ]
        }
    },
]

and I would like to generate this as a result...

{
    //* count of all documents
    "count": 3,
    //* count of all documents that contain any slug within the given category
    "countWithCategorySlug": {
        "language": 3,
        "industries": 3,
        "countries": 3,
        "regions": 2
    },
    //* per category: count of documents that contain that slug in the givin category
    "language" {
        "en": 3,
        "fr": 1
    },
    "industries" {
        "agency": 2,
        "travel": 3,
    },
    "countries" {
        "ca": 3,
        "us": 2
    },
    "regions" {
        "north-america": 1,
        "south-america": 1
    }
}

super stuck so any help would be appreciated. :)

The number of categories is unknown and I have a code solution that queries the list of disctint categories and slugs then for each one generates a $group stage... The resultant query is excessively big and there needs to be a better way... problem is that I have absolutely no idea on how to optimize it...

CodePudding user response:

Query

  • the first part before the facet is done to seperate them and make for each value 1 document like
  [{
  "type": "language",
  "value": "en",
  "_id": ObjectId("5d8e8c9b8f8b9b7b7a8b4567")
},
{
  "type": "industries",
  "value": "agency",
  "_id": ObjectId("5d8e8c9b8f8b9b7b7a8b4567")
},
{
  "type": "industries",
  "value": "travel",
  "_id": ObjectId("5d8e8c9b8f8b9b7b7a8b4567")
},
{
  "type": "countries",
  "value": "ca",
  "_id": ObjectId("5d8e8c9b8f8b9b7b7a8b4567")
}]
  • and then facet with 3 fields and count the documents
  • and after than transformations to have data on keys like the expected output

Playmongo

ggregate(
[{"$set": {"tags": {"$objectToArray": "$tags"}}},
 {"$set": 
   {"tags": 
     {"$map": 
       {"input": "$tags",
        "in": {"type": "$$this.k", "value": "$$this.v", "_id": "$_id"}}}}},
 {"$unwind": "$tags"},
 {"$replaceRoot": {"newRoot": "$tags"}},
 {"$unwind": "$value"},
 {"$facet": 
   {"count": 
     [{"$group": {"_id": null, "count": {"$addToSet": "$_id"}}},
       {"$set": {"count": {"$size": "$count"}}}],
    "category": 
     [{"$group": {"_id": "$type", "count": {"$addToSet": "$_id"}}},
       {"$set": {"count": {"$size": "$count"}}}],
    "values": 
     [{"$group": 
         {"_id": "$value",
          "type": {"$first": "$type"},
          "values": {"$addToSet": "$_id"}}},
       {"$set": {"values": {"$size": "$values"}}},
       {"$group": 
         {"_id": "$type",
          "values": 
           {"$push": 
             {"type": "$type", "value": "$_id", "count": "$values"}}}}]}},
 {"$set": 
   {"count": 
     {"$getField": 
       {"field": "count", "input": {"$arrayElemAt": ["$count", 0]}}},
    "category": 
     {"$arrayToObject": 
       [{"$map": 
           {"input": "$category",
            "in": {"k": "$$this._id", "v": "$$this.count"}}}]},
    "values": 
     {"$arrayToObject": 
       [{"$map": 
           {"input": "$values",
            "in": 
             {"k": "$$this._id",
              "v": 
               {"$arrayToObject": 
                 [{"$map": 
                     {"input": "$$this.values",
                      "in": {"k": "$$this.value", "v": "$$this.count"}}}]}}}}]}}}])

Outputs

[{
  "count": 3,
  "category": {
    "countries": 3,
    "industries": 3,
    "regions": 2,
    "language": 3
  },
  "values": {
    "regions": {
      "south-america": 1,
      "north-america": 1
    },
    "countries": {
      "us": 2,
      "ca": 3
    },
    "language": {
      "fr": 1,
      "en": 3
    },
    "industries": {
      "agency": 2,
      "travel": 3
    }
  }
}]
  • Related