Example document:
{ "hash": [1, 2, 7] }
{ "hash": [3, 2, 1] }
Given a hash array as input, of same size, I want to find the arrays that have the highest intersection, accounting for order.
So if the input is [1, 2, 3], then the first hash would have a score of 2, while the second would have a score of 1 (because only the second element matches the input's second element).
I know that for a normal intersection I can use
arr = [1, 2, 3]
collection.aggregate([{
"$addFields": {
"weight": {
"$size": { "$setIntersection": [ "$hash", arr ] }
}
}
},
{ "$sort": { "weight": -1 } }])
But I'm stumped on comparing the order.
CodePudding user response:
I'm not sure how to do this in the query itself, but you could always just query everything and do the filtering in Python:
def get_intersection(arr):
"""
find score of array with highest intersection, where order matters
"""
best = 0
for hash in collection.find():
score = sum(hash["hash"][i] == arr[i] for i in range(len(arr)))
best = max(best, score)
return best
CodePudding user response:
Yes, this can be done with an aggregation pipeline with:
- $addFields stage to add a "score" field
- $range to generate the indexes from 0 to the size of the test array
- $reduce over the array of indexes
- $cond with $eq and $add to keep a running total of matching elements
db.collection.aggregate([
{$addFields: {
score: {
$reduce: {
input: {$range: [0,{$size: [[1,2,3]]}]},
initialValue: 0,
in: {
$cond: {
if: {$eq: [
{$arrayElemAt: ["$hash", "$$this"]},
{$arrayElemAt: [[1,2,3], "$$this"]}
]},
then: {$add: [ "$$value", 1]},
else: "$$value"
}
}
}
}
}
}
])
If you try this with real code, replace both instances of [1,2,3]
with the variable holding the test array.