Home > Back-end >  Get all documents, and in case of equality (similarity) - later ones
Get all documents, and in case of equality (similarity) - later ones

Time:10-05

There is a parser for sports. It works in a circle. By some logic, sports events are added to the database. In each parsing cycle, a sports event can be added to the database, but it may not be added)

I need to get all sports events from the last two parsing cycles. But, if there is a sporting event in both cycles, then only from the last one. This is the problem. Sample documents:

 {
    "command1": "A",
    "command2": "B",
    "parseCount": 0
  },
  {
    "command1": "A",
    "command2": "B",
    "parseCount": 1
  },
  {
    "command1": "A",
    "command2": "B",
    "parseCount": 2
  },
  {
    "command1": "C",
    "command2": "D",
    "parseCount": 1
  },
  {
    "command1": "E",
    "command2": "F",
    "parseCount": 2
  },

As a result, I should get the last 3 documents from the list above. The document also has fields: match time and ObjectId

https://mongoplayground.net/p/-9gz4zOnudW

CodePudding user response:

If I've understood correctly you can first $sort and then get $first object into $group like this:

I've used $first: $$ROOT but you can use $first: value for each value from the object if you want.

This query:

  • First sort by parseCount to get the higher value in first position.
  • Then $group by two conditions, getting the first object (as is ordered is the higher one)
  • And use $project to get output values you want.
db.collection.aggregate([
  {
    "$sort": {
      "parseCount": -1
    }
  },
  {
    "$group": {
      "_id": {
        "command1": "$command1",
        "command2": "$command2"
      },
      "object": {
        "$first": "$$ROOT"
      }
    }
  },
  {
    "$project": {
      "_id": "$object._id",
      "command1": "$object.command1",
      "command2": "$object.command2",
      "parseCount": "$object.parseCount"
    }
  }
])

Example here

CodePudding user response:

Query

  • its like 2 queries but can become 1 with lookup
  • facet could be used and do 2 groups but would be limited to max 16mb data in collection, the bellow solution hasn't this limitation
  • finds the max parse for the collection using the lookup (mongodb auto-optimizes it, so the pipeline in lookup will run only 1 time not for each document of collection, at least this happened when i tested it in the past)
  • we keep only the last 2 parses, for example if max=3 we keep parseCount=3 and parseCount=2 , we also keep only parseCount>0, you had that filter on Playground, if you dont need it remove it.
  • group by command1,command2 and keep only the max parseCount you said that we keep only the latest if we have more than 1
  • project to restore the document structure, matchTime and _id are kept also because you said you have those also

Test code here

db.collection.aggregate([
  {
    "$lookup": {
      "from": "collection",
      "pipeline": [
        {
          "$group": {
            "_id": null,
            "maxParse": {
              "$max": "$parseCount"
            }
          }
        }
      ],
      "as": "result"
    }
  },
  {
    "$set": {
      "maxParses": {
        "$let": {
          "vars": {
            "v0": {
              "$arrayElemAt": [
                "$result",
                0
              ]
            }
          },
          "in": "$$v0.maxParse"
        }
      }
    }
  },
  {
    "$unset": [
      "result"
    ]
  },
  {
    "$match": {
      "$expr": {
        "$and": [
          {
            "$gt": [
              "$parseCount",
              0
            ]
          },
          {
            "$gte": [
              "$parseCount",
              {
                "$subtract": [
                  "$maxParses",
                  1
                ]
              }
            ]
          }
        ]
      }
    }
  },
  {
    "$group": {
      "_id": {
        "command1": "$command1",
        "command2": "$command2"
      },
      "maxParseCount": {
        "$max": {
          "parseCount": "$parseCount",
          "matchTime": "$matchTime",
          "id": "$_id"
        }
      }
    }
  },
  {
    "$replaceRoot": {
      "newRoot": {
        "$mergeObjects": [
          "$_id",
          "$maxParseCount",
          "$$ROOT"
        ]
      }
    }
  },
  {
    "$project": {
      "command1": 1,
      "command2": 1,
      "parseCount": 1,
      "matchTime": 1,
      "_id": "$id"
    }
  }
])

To explain what query does based on your data

  • will find maxParse=2 (lookup does this)
  • first document will filtered out, because 2,1 parses are only kept and it has 0
  • the rest one will be grouped by command1,command2 "A","B" has 2 documents(_id=2,_id=3) but only _id=3 will pass because it has the max parseCount
[
  {
    "_id": 1,
    "command1": "A",
    "command2": "B",
    "parseCount": 0,
    "matchTime": 1
  },
  {
    "_id": 2,
    "command1": "A",
    "command2": "B",
    "parseCount": 1,
    "matchTime": 2
  },
  {
    "_id": 3,
    "command1": "A",
    "command2": "B",
    "parseCount": 2,
    "matchTime": 3
  },
  {
    "_id": 4,
    "command1": "C",
    "command2": "D",
    "parseCount": 1,
    "matchTime": 4
  },
  {
    "_id": 5,
    "command1": "E",
    "command2": "F",
    "parseCount": 2,
    "matchTime": 5
  }
]

Results

[
  {
    "_id": 3,
    "command1": "A",
    "command2": "B",
    "matchTime": 3,
    "parseCount": 2
  },
  {
    "_id": 4,
    "command1": "C",
    "command2": "D",
    "matchTime": 4,
    "parseCount": 1
  },
  {
    "_id": 5,
    "command1": "E",
    "command2": "F",
    "matchTime": 5,
    "parseCount": 2
  }
]
  • Related