Home > Enterprise >  How to customize Ruby's Marshal load without changing format?
How to customize Ruby's Marshal load without changing format?

Time:04-03

I need to customize what happens to one of my class on Marshal.load(). I found that there are marshal_load and _load methods but so far I was unsuccessful to use either of them to what I need.

It's crucial for me that the resulting binary data are not changed compared to if those methods are not defined. This rules out _dump and _load as they're for defining custom serialization string format.

I tried to use marshal_load and marshal_dump but it changes the resulting binary format as well. This can't happen.

EDIT: To be a bit more specific about my usecase I have a marshalled binary file which contains strings in 8bit. I need to change these strings to utf8 in marshal_load and back to 8bit in marshal_dump. I created this repository for testing, read the comments here: https://github.com/enumag/marshal-test/blob/master/test.rb

CodePudding user response:

I saw in your Github example you have the below comment, but I don't think you have the option not to do it manually before the dump and after the load.

in practice I'm Marshal loading a large structure with many nested objects so I can't do it manually

I would suggest one of the two following schemes.

Create wrapper around Marshal and use that to modify your object before and after marshaling.

class Foo
    def initialize(attr_1, attr_2)
        @attr_1 = attr_1
        @attr_2 = attr_2
    end
end

class MyMarshaler
    def load(file_name)
        source = File.new(file_name, "r")
        loaded = Marshal.load(source)
        source.close
        # do whatever you want with loaded object
        manipulated = loaded     
    end

    def dump(obj, file_name)
        dest = File.new(file_name, "w")
        # do whatever you want with object before dumping it
        manipulated = obj
        Marshal.dump(manipulated, dest)
        dest.close
    end
end

foo = Foo.new("foo-thing-1", "foo-thing-2")
foo_file_name = "marshaled.ruby_object"
marshaler = MyMarshaler.new
p foo
marshaler.dump(foo, foo_file_name)
restored_foo = marshaler.load(foo_file_name)
p restored_foo

The other option would be to create a Module and extend your class to include methods that wrap the marshal load/dump methods.

module MyMarshalerModule
    def load(file_name)
        source = File.new(file_name, "r")
        loaded = Marshal.load(source)
        source.close
        # do whatever you want with loaded object
        manipulated = loaded     
    end

    def dump(obj, file_name)
        dest = File.new(file_name, "w")
        # do whatever you want with object before dumping it
        manipulated = obj
        Marshal.dump(manipulated, dest)
        dest.close
    end
end

class Bar
    extend MyMarshalerModule
    def initialize(attr_1, attr_2)
        @attr_1 = attr_1
        @attr_2 = attr_2
    end
end

bar = Bar.new("bar-thing-1", "bar-thing-2")
bar_file_name = "marshaled2.ruby_object"
p bar
Bar.dump(bar, bar_file_name)
restored_bar = Bar.load(bar_file_name)
p restored_bar

CodePudding user response:

I think my original answer addressed the question in your post, but in the comments you imply that you really have 2 questions.

This answer attempts to answer both questions.

Here is a TransMarshal class that allows you to pass a block to both the load and dump methods that recursively looks for instances of a class and transform those instances using a nested block. The objects of the transformation are untouched.

class TransMarshal
    def self.load(file_name)
        source = File.new(file_name, "r")
        loaded = Marshal.load(source)
        source.close
        block_given? ? yield(loaded) : loaded
    end
    
    def self.dump(obj, file_name)
        dest = File.new(file_name, "w")
        manipulated = block_given? ? yield(obj) : obj
        Marshal.dump(manipulated, dest)
        dest.close
    end

    def self.transform(obj, klass, &block)
        return obj if obj.instance_variables.empty?
        cloned_obj = obj.clone
        cloned_obj.instance_variables.each do |o|
            cloned_inst_var = cloned_obj.instance_variable_get(o).clone
            cloned_obj.instance_variable_set(o, transform(cloned_inst_var, klass, &block))
            if cloned_inst_var.class == klass
                cloned_obj.instance_variable_set(o, yield(cloned_inst_var))
            end
        end
        cloned_obj
    end

end

Example usage.

class Foo
    def initialize(attr_1, attr_2)
        @attr_1 = attr_1
        @attr_2 = attr_2
    end
end

class Bar
    def initialize(attr_1, attr_2)
        @attr_1 = attr_1
        @attr_2 = attr_2
    end
end

class FooBar
    # extend MyMarshalerModule
    def initialize(foo, bar, string, number)
        @foo = foo
        @bar = bar
        @string = string
        @number = number
    end 
end

foo = Foo.new("foo_attr_1", "foo_attr_2")
bar = Bar.new("bar_attr_1", "bar_attr_2")
foo_bar = FooBar.new(foo, bar, "foobar", 42)

TransMarshal.dump(foo_bar, "dumped") do |obj|
    TransMarshal.transform(obj, String) {|s| "#{s}-x"}
end

raw_restored_foo_bar = TransMarshal.load("dumped")
puts "raw_restored_foo_bar = #{raw_restored_foo_bar.inspect}"

transformed_restored_foo_bar = TransMarshal.load("dumped") do |obj|
     TransMarshal.transform(obj, String) { |o| o[0..-3] }
end
puts "transformed_restored_foo_bar = #{transformed_restored_foo_bar.inspect}"
  • Related