I have an array of hashes:
array = [
{foo: 1, bar1: 2 bar2: 3, bar3: 4},
{foo: 2, bar1: 3 bar2: 4, bar3: 5},
{foo: 3, bar1: 4 bar2: 5, bar4: 6},
etc
]
I want to eliminate some redundant results from this array. Specifically, I want to eliminate any results where foo, bar1, and bar2 are identical across multiple objects, which can easily be done like so:
array.uniq! { |object| [object.foo, object.bar1, object.bar2] }
However, there is an additional edge case where I must also eliminate one of the following objects, which I don't know how to solve:
{foo: 1, bar1: 3 bar2: 2,...}
{foo: 1, bar1: 2 bar2: 3,...}
Specifically, bar1 and bar2 may be switched in some of the data, and I want to only have unique results where those two are collectively the same pair. (2, 3 should be considered redundant as 3, 2).
CodePudding user response:
After fully writing up this question I realized I had an answer, but I'm not sure how ideal it is. I simply combined the two interchangeable variables into a single array and then sorted them, which guarantees that they will always be identical even if they two values are switched:
array.uniq! { |object| [ object.foo, [object.bar1, object.bar2].sort ] }
I'd love to know if anyone has better solutions.
Also, unsurprisingly, inserting a uniq! method into a large sorting action is causing some performance issues, so I'm exploring ways to further optimize it by adding additional filters etc. This is all for a cache for an API endpoint.
CodePudding user response:
Since you have special equality rules, it seems like the most performant solution would be to override the Object#hash
and Object#eql?
functions as these are what is used by Array#uniq
. If you have millions of records this may well be necessary for adequate performance.
require 'pp'
class MyHash < Hash
def hash
# Note that the XOR operator is commutative, so the three values
# can be in any order and still output the same hash.
self[:foo].hash ^ self[:bar1].hash ^ self[:bar2].hash
end
def eql?(other)
# I think this is a bit ugly, and welcome suggestions for better
# performance and readability.
self[:foo] == other[:foo] && (
self[:bar1] == other[:bar1] && self[:bar2] == other[:bar2]
) || (
self[:bar1] == other[:bar2] && self[:bar2] == other[:bar1]
)
end
end
a = MyHash[foo: 10, bar1: 2, bar2: 3, ignored: 'a']
b = MyHash[foo: 10, bar1: 3, bar2: 2, ignored: 'b']
c = MyHash[foo: 20, bar1: 2, bar2: 3, ignored: 'c']
d = MyHash[foo: 20, bar1: 3, bar2: 2, ignored: 'd']
e = MyHash[foo: 2, bar1: 20, bar2: 3, ignored: 'e']
f = MyHash[foo: 3, bar1: 2, bar2: 20, ignored: 'f']
puts a.hash #=> 3556565295874809176
puts b.hash #=> 3556565295874809176
puts c.hash #=> 2914353897173641784
puts d.hash #=> 2914353897173641784
puts e.hash #=> 2914353897173641784
puts f.hash #=> 2914353897173641784
array = [a, b, c, d, e, f]
pp array #=> [{:foo=>10, :bar1=>2, :bar2=>3, :ignored=>"a"},
# {:foo=>10, :bar1=>3, :bar2=>2, :ignored=>"b"},
# {:foo=>20, :bar1=>2, :bar2=>3, :ignored=>"c"},
# {:foo=>20, :bar1=>3, :bar2=>2, :ignored=>"d"},
# {:foo=>2, :bar1=>20, :bar2=>3, :ignored=>"e"},
# {:foo=>3, :bar1=>2, :bar2=>20, :ignored=>"f"}]
pp array.uniq #=> [{:foo=>10, :bar1=>2, :bar2=>3, :ignored=>"a"},
# {:foo=>20, :bar1=>2, :bar2=>3, :ignored=>"c"},
# {:foo=>2, :bar1=>20, :bar2=>3, :ignored=>"e"},
# {:foo=>3, :bar1=>2, :bar2=>20, :ignored=>"f"}]
If you just have thousands of records then the solution you proposed should be completely fine.
array.uniq! { |object| [ object[:foo], [object[:bar1], object[:bar2]].sort ] }