Ruby WeakRef has implicit race condition?-CodePudding

I'm looking at Ruby WeakRef, and it seems that the way the API is written has an implied race condition, though it seems very unlikely to hit.

The basic usage implied by the API is:

obj = Object.new
foo = WeakRef.new(obj)

# Later on:
if (foo.weakref_alive?)
    puts "I can allegedly use #{foo.to_s} now"
end

# Or even:
obj2 = foo.__getobj__ if foo.weakref_alive?

The problem lies in the fact that we don't have control over when garbage collection may happen, as an example, consider another thread running that regularly calls GC.start.

If we have garbage collection happening between the weakref_alive? check and the usage of the object, then we will end up hitting the RefError exception.

(I would actually expect that any large application that uses weakref - particularly those that are multithreaded - would hit RefErrors occasionally due to this)

I'm surprised there's no way to safely get the object in an atomic way if the object is available at the moment we check it.

So the question is first, am I overconcerned? Is there some reason we don't have to ever worry about a GC happening if we fetch the object right away after checking it (as in the second example)? And if not, then that gives us the second question, of the best way to safely work with weakrefs.

Right now I've added an 'obj' method to the class as one way to deal with it:

require 'weakref'
class WeakRef
  def obj
    begin
      return self.__getobj__
    rescue RefError
      return nil
    end
  end
end

But unnecessary 'rescue' statements kind of bug me. I suppose we could also:

require 'weakref'
class WeakRef
  def obj
    savegc = GC.disable
    obj = self.weakref_alive? ? self.__getobj__ : nil
    GC.enable if savegc
    return obj
  end
end

But I'm skeptical that it's low-cost to just disable and re-enable the garbage collection, much less whether this is a completely atomic operation.

Any advice from ruby GC experts?

CodePudding user response：

At first, please note the intended use for a WeakRef object, namely to stand in for the original object. Here, the WeakRef object implements the full duck-typed interface of the referenced object by forwarding all messages sent to it. As such, the WeakRef object is intended to be used directly in place of the original object (if it is still available).

While you may get a reference to the original object (if it is still available) with WeakRef#__getobj__, this is intended to be a special use-case and more of an implementation detail of the message delegation. If you do this however, you can check if the referenced object is still available with WeakRef#weakref_alive?. As you have noticed, there is the (at least theoretical) option for a race-condition, depending on your used Ruby implementation.

To be sure that you handle such race-conditions gracefully, you can indeed rescue the RefError if it occurs. You can just optimize the non-race-condition case a bit:

obj = Object.new
foo = WeakRef.new(obj)

begin
  obj2 = foo.__getobj__ if foo.weakref_alive?
rescue RefError
   obj2 = nil
end

You can use the same pattern for any other message sent to your weak reference (which then gets forwarrdr to your referenced object), e.g.

begin
  foo.to_s if foo.weakref_alive?
rescue RefError
  # # do nothing as foo is a dangling reference to a garbage-collected object
end

Depending on your use-case, this may be a bit awkward though. Also, sometimes it is necessary to have the actual object reference rather than a wrapped object (which may behave differently when inquired about its specific class., e.g. in a case statement).

Here, an option could be to use ObjectSpace::WeakMap instead of the WeakRef. This class is used internally by WeakRef to actually hold the weak references. Ruby actually discourages the use of this class and regards it as an internal class. However, I found it to be useful to implement a more straight-forward lookup than with just WeakRef. Just be aware that the behavior in this area might subtly change and it might be a good idea to read changelogs as you update your Ruby versions.

With that out of the way, a sample lookup with ObjectSpace::WeakMap could look like this:

# The WeakMap object which can store multiple maps from an
# existing object to another (potentially garbage-collected) object.
# If you need multiple weak references, you can still use
# a single map.
WEAK_MAP = ObjectSpace::WeakMap.new

# Our referenced object which may or may not be garbage-collected later
obj = Object.new

# The "marker" object is the key in map. It is used to look the reference
# to the intended object. You need to always use the same object here
# (rather than e.g. a similar string) as the actual object_id of the marker
# is used for the lookup of the referenced object
marker = Object.new

# Store a reference in the weak map
WEAK_MAP[marker] = obj

#########################################################
# Now do something else...                              #
# obj may be garbage-collected in the meantime.         #
# You need to hold onto the marker object though!       #
#########################################################

# Now, you can retrieve a reference to the actual
# original object (if it is still available) or nil
# if obj was already garbage-collected
obj2 = WEAK_MAP[marker]

As written above, the WeakRef class uses exactly this mechanism internally. Here, the WeakRef object uses itself as the marker. That is, as long as you hold the actual WeakRef object. The simplified lookup in WeakRef#__getobj__ thus looks like this:

class WeakRef
  WEAK_MAP = ObjectSpace::WeakMap.new

  def __getobj__
    WEAK_MAP[self] || raise RefError, "Invalid Reference"
  end

  def weakref_alive?
    !WEAK_MAP[self].nil?
    # actually, it's this mostly equivalent code
    # WEAK_MAP.key?(self)
  end
end

You can find the implementation of the WeakRef class at https://github.com/ruby/ruby/blob/master/lib/weakref.rb - have a look, it's actually quite readable.