I have 4 conditions which I want to use to fetch my documents:
- data MUST be between
from_date
-to_date
imageKey
MUST be included inIDS
- EITHER (at least 1 of)
"updatedCoordinates.status"
should be in["ORIGINALLY_UNDETECTED","CHANGED"]
- OR the length of
updatedCoordinates
should NOT be equal topredictedCoordinates
I have created a query for the given 2. Need to know if my 3rd satisfies the at least 1 condition and how to correctly implement the 4th?
from datetime import datetime
from_date = datetime(2022, 9, 25, 0, 1, 1, 0)
to_date = datetime(2022, 9, 28, 23, 59, 59, 0)
QUERY = {"$and":[
{'imageKey':{"$in":IDS}},
{"createdAt": {"$gte": from_date, "$lt": to_date}},
{
"$or":[
{"updatedCoordinates.status": {"$in":["ORIGINALLY_UNDETECTED","CHANGED"]}},
{"$ne":[{"$size":"updatedCoordinates"},{"$size":"predictedCoordinates"}]}
]
}]}
my_collection.count_documents(QUERY)
I can use the Python
function for 3rd and 4th after fetching all the data like:
def is_valid(box):
if len(box["updatedCoordinates"]) != len(box["predictedCoordinates"]): return True
for up in box["updatedCoordinates"]:
if up["status"] in ["ORIGINALLY_UNDETECTED","CHANGED"]: return True
return False
But it'll be lots of overhead.
Here is what my data looks like:
{
'createdAt': datetime.datetime(2012, 8, 12, 10, 1, 47, 693000),
'imageKey': '12345678901234567890123456789012',
'updatedCoordinates': [{
'xmin': 102.6942138671875,
'xmax': 1020.8214111328125,
'ymin': 748.9141235351562,
'ymax': 962.6471557617188,
'status': 'ORIGINALLY_UNDETECTED',
},
{
'xmin': 84.90644836425781,
'xmax': 1061.1834716796875,
'ymin': 1089.3341064453125,
'ymax': 1381.4041748046875,
'status': 'CHANGED',
},
{
'xmin': 108.43966674804688,
'xmax': 1022.5031127929688,
'ymin': 307.7042236328125,
'ymax': 546.1324462890625,
'status': 'UNCHANGED',
},
],
'predictedCoordinates': [{
'xmin': 139.69284057617188,
'xmax': 1008.458251953125,
'ymin': 733.7783203125,
'ymax': 951.54638671875,
'status': 'UNCHANGED',
},
{
'xmin': 155.5401153564453,
'xmax': 989.7386474609375,
'ymin': 1175.1036376953125,
'ymax': 1389.05322265625,
'status': 'UNCHANGED',
},
]}
CodePudding user response:
For Query 4, use the $expr
operator (new in MongoDb v5) to build an aggregation to compare the size of the updatedCoordinates
and predictedCoordinates
fields.
QUERY = {"$and":[
{'imageKey':{"$in":IDS}},
{"createdAt": {"$gte": from_date, "$lt": to_date}},
{
"$or":[
{"updatedCoordinates.status": {"$in":["ORIGINALLY_UNDETECTED","CHANGED"]}},
{"$expr": {$ne: [{"$size":"$updatedCoordinates"},{"$size":"$predictedCoordinates"}]}}
]
}]}
CodePudding user response:
Since your 4th condition is matching the document to itself, you need an aggregation pipeline. For example:
db.collection.aggregate([
{$match: {
$expr: {
$and: [
{$in: ["$imageKey", IDS]},
{$gte: ["$createdAt", from_date]},
{$lt: ["$createdAt", to_date]},
{$or: [
{$gt: [
{ $size: {
$setIntersection: [
"$updatedCoordinates.status",
["ORIGINALLY_UNDETECTED", "CHANGED"]
]
}},
0
]
},
{$ne: [{$size: "$updatedCoordinates"}, {$size: "$predictedCoordinates"}]}
]}
]
}
}
}
])
See how it works on the playground example