Home > front end >  How to fetch documents based on the count of the elements from pymongo / mongodb
How to fetch documents based on the count of the elements from pymongo / mongodb

Time:10-10

I have 4 conditions which I want to use to fetch my documents:

  1. data MUST be between from_date - to_date
  2. imageKey MUST be included in IDS
  3. EITHER (at least 1 of) "updatedCoordinates.status" should be in ["ORIGINALLY_UNDETECTED","CHANGED"]
  4. OR the length of updatedCoordinates should NOT be equal to predictedCoordinates

I have created a query for the given 2. Need to know if my 3rd satisfies the at least 1 condition and how to correctly implement the 4th?

from datetime import datetime

from_date = datetime(2022, 9, 25, 0, 1, 1, 0)
to_date = datetime(2022, 9, 28, 23, 59, 59, 0)

QUERY = {"$and":[
    {'imageKey':{"$in":IDS}},
    {"createdAt": {"$gte": from_date, "$lt": to_date}},
    {
        "$or":[
            {"updatedCoordinates.status": {"$in":["ORIGINALLY_UNDETECTED","CHANGED"]}},
            {"$ne":[{"$size":"updatedCoordinates"},{"$size":"predictedCoordinates"}]}
        ]
    }]}

my_collection.count_documents(QUERY)

I can use the Python function for 3rd and 4th after fetching all the data like:

def is_valid(box):
    if len(box["updatedCoordinates"]) != len(box["predictedCoordinates"]): return True
    
    for up in box["updatedCoordinates"]:
        if up["status"] in ["ORIGINALLY_UNDETECTED","CHANGED"]: return True
    return False

But it'll be lots of overhead.

Here is what my data looks like:

{
 'createdAt': datetime.datetime(2012, 8, 12, 10, 1, 47, 693000),
 'imageKey': '12345678901234567890123456789012',
 'updatedCoordinates': [{
   'xmin': 102.6942138671875,
   'xmax': 1020.8214111328125,
   'ymin': 748.9141235351562,
   'ymax': 962.6471557617188,
   'status': 'ORIGINALLY_UNDETECTED',
   },
  {
   'xmin': 84.90644836425781,
   'xmax': 1061.1834716796875,
   'ymin': 1089.3341064453125,
   'ymax': 1381.4041748046875,
   'status': 'CHANGED',
   },
  {
   'xmin': 108.43966674804688,
   'xmax': 1022.5031127929688,
   'ymin': 307.7042236328125,
   'ymax': 546.1324462890625,
   'status': 'UNCHANGED',
   },
  ],
 'predictedCoordinates': [{
   'xmin': 139.69284057617188,
   'xmax': 1008.458251953125,
   'ymin': 733.7783203125,
   'ymax': 951.54638671875,
   'status': 'UNCHANGED',
   },
  {
   'xmin': 155.5401153564453,
   'xmax': 989.7386474609375,
   'ymin': 1175.1036376953125,
   'ymax': 1389.05322265625,
   'status': 'UNCHANGED',
   },
  ]}

CodePudding user response:

For Query 4, use the $expr operator (new in MongoDb v5) to build an aggregation to compare the size of the updatedCoordinates and predictedCoordinates fields.


QUERY = {"$and":[
    {'imageKey':{"$in":IDS}},
    {"createdAt": {"$gte": from_date, "$lt": to_date}},
    {
        "$or":[
            {"updatedCoordinates.status": {"$in":["ORIGINALLY_UNDETECTED","CHANGED"]}},
            {"$expr": {$ne: [{"$size":"$updatedCoordinates"},{"$size":"$predictedCoordinates"}]}}
        ]
    }]}

CodePudding user response:

Since your 4th condition is matching the document to itself, you need an aggregation pipeline. For example:

db.collection.aggregate([
  {$match: {
      $expr: {
        $and: [
          {$in: ["$imageKey", IDS]},
          {$gte: ["$createdAt", from_date]},
          {$lt: ["$createdAt", to_date]},
          {$or: [
              {$gt: [
                  { $size: {
                    $setIntersection: [
                      "$updatedCoordinates.status",
                      ["ORIGINALLY_UNDETECTED", "CHANGED"]
                    ]
                  }},
                  0
                ]
              },
              {$ne: [{$size: "$updatedCoordinates"}, {$size: "$predictedCoordinates"}]}
          ]}
        ]
      }
    }
  }
])

See how it works on the playground example

  • Related