I have data in memory, especially strings, that have large numbers of duplicates. We're hitting the ceiling with memory sometimes and are trying to reduce our footprint. I thought that if I froze the strings, then the Ruby runtime would combine them into single objects in memory. So I thought that this code would return a lower number, ideally, 1, but it did not:
a = Array.new(1000) { 'foo'.dup.freeze } # create separate objects, but freeze them
sleep 5 # give the runtime some time to combine the objects
a.map(&:object_id).uniq.size # => 1000
I guess this makes sense, because if there was a reference to the duplicated object (e.g. object id #202), and all of the frozen strings are combined to use #200, then dereferencing #202 will fail. So maybe my question doesn't make sense.
I guess the best strategy for me to save memory might be to convert the strings to symbols. I am aware that they will never be garbage collected, there would be a small enough number of them that this would not be a problem. Is there a better way?
CodePudding user response:
You basically have the right idea, but in my opinion you found a big gotcha in Ruby. You are correct that Ruby can dedup frozen strings to save memory but in general frozen ≠ deduped!!!
tl;dr the reason is because the two operations have different semantics. Always use String#-@
if you want it deduped.
Recall that freeze
is a method of Object
, so it has to work with every class. In English, freeze
is "make it so no further changes can be made to this object and also return the same object so that I can keep calling methods on it". In particular, it would be odd if x.freeze != x
. Imagine if I had two arrays that I was modifying, then decided to freeze them. Would it make sense for the interpreter to then iterate through both arrays to see if their contents are equal and to decide to completely throw away one of them? That could be very expensive. So in general freeze
does not promise this behavior and always returns the same object, just frozen.
Deduping works very differently because when you call -myStr
you're actually saying "return the unique frozen version of this string in memory". In most cases the whole point is to get a different object than the one in myStr
(so that the GC can clean up that string and only keep the frozen one).
Unfortunately, the distinction is muddled since if you call freeze
on a string literal, Ruby will dedup it automatically! This is sensible because there's no way to get a reference to the original literal object; the fact that the interpreter is allowing x.freeze != x
doesn't matter, so we might as well save some memory. But it might also give the impression that freeze
does guarantee deduping, when in fact it does not.
This gotcha was discussed when string deduping was first introduced, so it is definitely an intentional design decision by the Ruby developers.