I'm developing an algorithm in Ruby with the following properties:
- It works on two objects of type Set, where each element is an Array, where all elements are of type String
- Each Array involved has the same number of elements
- No two arrays happen to be have the same content (when comparing with
==
) - The algorithm involves many operations of moving an array from one Set to the other (or back), storing references to certain Arrays, and testing whether or not that reference is part of the Array
- There is no duplication of the Arrays; all Arrays keep their object ID during all the time.
A native implementation would do something like this (to give you the idea); in practice, the arrays here have longer strings and more elements:
# Set up all Arrays involved
master=[
%w(a b c d),
%w(a b c x),
%w(u v w y),
# .... and so on
]
# Create initial sets.
x=Set.new
y=Set.new
# ....
x.add(master[0])
x.add(master[2])
y.add(master[1])
# ....
# Operating on the sets.
i=1
# ...
arr=master[i]
# Move element arr from y to x, if it is in y
if(y.member?(arr)
y.delete(arr)
x.add(arr)
end
# Do something with the sets
x.each { |arr| puts arr.pretty_print }
This would indeed work, simply because the arrays are all different in content. However, testing for membership means that y.member?(arr)
tests that we don't have already an object with the same array content like arr
in our Set, while it would be sufficient to verify to test that we don't have already an element with the same object_id in our Set, so I'm worried about performance. From my understanding, finding the the object id of an object is cheap, and since it is just a number, maintaining a set of numbers is more performant than maintaining a set of arrays of strings.
Therefore I could try to define my two sets as sets of object_id, and membership test would be faster. However when iterating over a Set, using the object_id to find the array itself is expensive (I would have to search ObjectSpace
).
Another possibility would be to not maintain the set of arrays, but the set of indexes into my master array. My code would then be, for example,
x.add(0) # instead of x.add(master[0])
and iterating over a Set would be, i.e.
x.each { |i| puts master[i].pretty_print }
I wonder whether there is a better way - for instance that we can somehow "teach" Set.new to use object identity for maintaining its members, instead of equality.
CodePudding user response:
I think you’re looking for Set#compare_by_identity
, which makes the set use the object’s identity (i.e. object ID) of its contents.
x = Set.new
x.compare_by_identity