Home > Blockchain >  pymongo: array comparison based on order
pymongo: array comparison based on order

Time:11-23

Example document:

{ "hash": [1, 2, 7] }
{ "hash": [3, 2, 1] }

Given a hash array as input, of same size, I want to find the arrays that have the highest intersection, accounting for order.

So if the input is [1, 2, 3], then the first hash would have a score of 2, while the second would have a score of 1 (because only the second element matches the input's second element).

I know that for a normal intersection I can use

arr = [1, 2, 3]
collection.aggregate([{
    "$addFields": {
        "weight": {
            "$size": { "$setIntersection": [ "$hash", arr ] }
        }
    }
},
{ "$sort": { "weight": -1 } }])

But I'm stumped on comparing the order.

CodePudding user response:

I'm not sure how to do this in the query itself, but you could always just query everything and do the filtering in Python:

def get_intersection(arr):
    """
    find score of array with highest intersection, where order matters
    """
    best = 0
    for hash in collection.find():
        score = sum(hash["hash"][i] == arr[i] for i in range(len(arr)))
        best = max(best, score)
    return best

CodePudding user response:

Yes, this can be done with an aggregation pipeline with:

  • $addFields stage to add a "score" field
  • $range to generate the indexes from 0 to the size of the test array
  • $reduce over the array of indexes
  • $cond with $eq and $add to keep a running total of matching elements
db.collection.aggregate([
  {$addFields: {
      score: {
        $reduce: {
          input: {$range: [0,{$size: [[1,2,3]]}]},
          initialValue: 0,
          in: {
            $cond: {
              if: {$eq: [
                      {$arrayElemAt: ["$hash", "$$this"]},
                      {$arrayElemAt: [[1,2,3], "$$this"]}
                  ]},
              then: {$add: [ "$$value", 1]},
              else: "$$value"
            }
          }
        }
      }
    }
  }
])

If you try this with real code, replace both instances of [1,2,3] with the variable holding the test array.

Playground

  • Related