I'm looking at Ruby WeakRef, and it seems that the way the API is written has an implied race condition, though it seems very unlikely to hit.
The basic usage implied by the API is:
obj = Object.new
foo = WeakRef.new(obj)
# Later on:
if (foo.weakref_alive?)
puts "I can allegedly use #{foo.to_s} now"
end
# Or even:
obj2 = foo.__getobj__ if foo.weakref_alive?
The problem lies in the fact that we don't have control over when garbage collection may happen, as an example, consider another thread running that regularly calls GC.start
.
If we have garbage collection happening between the weakref_alive?
check and the usage of the object, then we will end up hitting the RefError
exception.
(I would actually expect that any large application that uses weakref - particularly those that are multithreaded - would hit RefErrors occasionally due to this)
I'm surprised there's no way to safely get the object in an atomic way if the object is available at the moment we check it.
So the question is first, am I overconcerned? Is there some reason we don't have to ever worry about a GC happening if we fetch the object right away after checking it (as in the second example)? And if not, then that gives us the second question, of the best way to safely work with weakrefs.
Right now I've added an 'obj' method to the class as one way to deal with it:
require 'weakref'
class WeakRef
def obj
begin
return self.__getobj__
rescue RefError
return nil
end
end
end
But unnecessary 'rescue' statements kind of bug me. I suppose we could also:
require 'weakref'
class WeakRef
def obj
savegc = GC.disable
obj = self.weakref_alive? ? self.__getobj__ : nil
GC.enable if savegc
return obj
end
end
But I'm skeptical that it's low-cost to just disable and re-enable the garbage collection, much less whether this is a completely atomic operation.
Any advice from ruby GC experts?
CodePudding user response:
At first, please note the intended use for a WeakRef
object, namely to stand in for the original object. Here, the WeakRef
object implements the full duck-typed interface of the referenced object by forwarding all messages sent to it. As such, the WeakRef
object is intended to be used directly in place of the original object (if it is still available).
While you may get a reference to the original object (if it is still available) with WeakRef#__getobj__
, this is intended to be a special use-case and more of an implementation detail of the message delegation. If you do this however, you can check if the referenced object is still available with WeakRef#weakref_alive?
. As you have noticed, there is the (at least theoretical) option for a race-condition, depending on your used Ruby implementation.
To be sure that you handle such race-conditions gracefully, you can indeed rescue the RefError
if it occurs. You can just optimize the non-race-condition case a bit:
obj = Object.new
foo = WeakRef.new(obj)
begin
obj2 = foo.__getobj__ if foo.weakref_alive?
rescue RefError
obj2 = nil
end
You can use the same pattern for any other message sent to your weak reference (which then gets forwarrdr to your referenced object), e.g.
begin
foo.to_s if foo.weakref_alive?
rescue RefError
# # do nothing as foo is a dangling reference to a garbage-collected object
end
Depending on your use-case, this may be a bit awkward though. Also, sometimes it is necessary to have the actual object reference rather than a wrapped object (which may behave differently when inquired about its specific class., e.g. in a case
statement).
Here, an option could be to use ObjectSpace::WeakMap
instead of the WeakRef
. This class is used internally by WeakRef
to actually hold the weak references. Ruby actually discourages the use of this class and regards it as an internal class. However, I found it to be useful to implement a more straight-forward lookup than with just WeakRef
. Just be aware that the behavior in this area might subtly change and it might be a good idea to read changelogs as you update your Ruby versions.
With that out of the way, a sample lookup with ObjectSpace::WeakMap
could look like this:
# The WeakMap object which can store multiple maps from an
# existing object to another (potentially garbage-collected) object.
# If you need multiple weak references, you can still use
# a single map.
WEAK_MAP = ObjectSpace::WeakMap.new
# Our referenced object which may or may not be garbage-collected later
obj = Object.new
# The "marker" object is the key in map. It is used to look the reference
# to the intended object. You need to always use the same object here
# (rather than e.g. a similar string) as the actual object_id of the marker
# is used for the lookup of the referenced object
marker = Object.new
# Store a reference in the weak map
WEAK_MAP[marker] = obj
#########################################################
# Now do something else... #
# obj may be garbage-collected in the meantime. #
# You need to hold onto the marker object though! #
#########################################################
# Now, you can retrieve a reference to the actual
# original object (if it is still available) or nil
# if obj was already garbage-collected
obj2 = WEAK_MAP[marker]
As written above, the WeakRef
class uses exactly this mechanism internally. Here, the WeakRef
object uses itself as the marker. That is, as long as you hold the actual WeakRef
object. The simplified lookup in WeakRef#__getobj__
thus looks like this:
class WeakRef
WEAK_MAP = ObjectSpace::WeakMap.new
def __getobj__
WEAK_MAP[self] || raise RefError, "Invalid Reference"
end
def weakref_alive?
!WEAK_MAP[self].nil?
# actually, it's this mostly equivalent code
# WEAK_MAP.key?(self)
end
end
You can find the implementation of the WeakRef class at https://github.com/ruby/ruby/blob/master/lib/weakref.rb - have a look, it's actually quite readable.