Object Marshalling in Ruby

farsi_mehdi

Mehdi FARSI

Posted on March 29, 2019

Object Marshalling in Ruby

In this article, we’re going to dive into object marshalling. We'll explain what it is, look at the Marshall module, and then go through an example. We'll then go a step deeper and compare the _dump and self._load methods. Let's go!

What’s Object Marshalling?

When you are writing code, you might want to save an object and transmit it to another program or reuse it in your next program execution. Object marshalling is used in Sidekiq, for example; when a Sidekiq job is enqueued in a Ruby on Rails application, then a serialization of this job — which is nothing more than an object — is inserted in Redis. The Sidekiq process is then able to deserialize this JSON and reconstitute the original job from the JSON.

In computer programming, this process of serialization and deserialization of an object is what we commonly call object marshalling. Now, let’s look at what Ruby natively provides to handle Object Marshalling.

The Marshal Module

As Ruby is a fully object oriented programming language, it provides a way to serialize and store objects using the Marshall module in its standard library. It allows you to serialize an object to a byte stream that can be stored and deserialized in another Ruby process.

So, let’s serialize a string and take a closer look at the serialized object.

hello_world = 'hello world!'

serialized_string = Marshal.dump(hello_world) # => "\x04\bI\"\x11hello world!\x06:\x06ET"
serialized_string.class                       # => String

deserialized_hello_world = Marshal.load(serialized_string) # => "hello world!"

hello_world.object_id              # => 70204420126020
deserialized_hello_world.object_id # => 70204419825700

We then call the Marshal.dump module method to serialize our string. We store the return value—which contains our serialized string—in the serialized_string variable. This string can be stored in a file and the file can be reused to reconstitute the original object in another process. We then call the Marshal.load method to reconstitute the original object from the byte stream.

We can see that this freshly reconstituted string has a different object_id than the hello_world string, which means it's a different object, but it contains the same data.

Pretty cool! But how is the Marshal module able to reconstruct the string? And, what if I want to have control over which attributes to serialize and deserialize?

A Concrete Example of Object Marshalling

To answer these questions, let’s implement a marshalling strategy on a custom struct named User.

User = Struct.new(:fullname, :age, :roles)

user = User.new('Mehdi Farsi', 42, [:admin, :operator])

The User struct defines 3 attributes: fullname, age, and roles. For this example we have a business rule where we only serialize when it matches the following criteria:

  • The fullname contains less than 64 characters
  • The roles array does not contain the :admin role

To do so, we can define a User#marshal_dump method to implement our custom serialization strategy. This method will be called when we invoke the Marshal.dump method with an instance of User struct as parameter. Let’s define this method:

User = Struct.new(:age, :fullname, :roles) do
  def marshal_dump
    {}.tap do |result|
      result[:age]      = age
      result[:fullname] = fullname if fullname.size <= 64
      result[:roles]    = roles unless roles.include? :admin
    end
  end
end

user = User.new(42, 'Mehdi Farsi', [:admin, :operator])

user_dump = Marshal.dump(user) # 'in User#marshal_dump'
user_dump                      # => "\x04\bU:\tUser{\a:\bageI\"\x10Mehdi Farsi\x06:\x06ET:\rfullnamei/"

In the above example, we can see that our User#marshal_dump method is called when we invoke Marshal.dump(user). The user_dump variable contains the string which is the serialization of our User instance.

Now that we have our dump, let’s deserialize it to reconstitute our user. To do so, we define a User#marshal_load method which is in charge of implementing the deserialization strategy of a User dump.

So let’s define this method.

User = Struct.new(:age, :fullname, :roles) do
  def marshal_dump
    {}.tap do |result|
      result[:age]      = age
      result[:fullname] = fullname if fullname.size <= 64
      result[:roles]    = roles unless roles.include? :admin
    end
  end

  def marshal_load(serialized_user)
    self.age      = serialized_user[:age]
    self.fullname = serialized_user[:fullname]
    self.roles    = serialized_user[:roles] || []
  end
end

user = User.new(42, 'Mehdi Farsi', [:admin, :operator])

user_dump = Marshal.dump(user) # 'in User#marshal_dump'
user_dump                      # => "\x04\bU:\tUser{\a:\bagei/:\rfullnameI\"\x10Mehdi Farsi\x06:\x06ET"

original_user = Marshal.load(user_dump)  # 'in User#marshal_load'
original_user                            # => #<struct User age=42, fullname="Mehdi Farsi", roles=[]>

In the above example, we can see that our User#marshal_load method is called when we invoke Marshal.load(user_dump). The original_user variable contains a struct which is a reconstitution of our user instance.

Note that the original_user.roles is not similar to the user.roles array since during the serialization, user.roles included the :admin role. So the user.roles wasn’t serialized into the user_dump variable.

The _dump and self._load Methods

When Marshal.dump and Marshal.load are invoked, these methods call the marshal_dump and the marshal_load methods on the object passed as the parameter of these methods.

But, what if I tell you that the Marshal.dump and the Marshal.load methods try to call two other methods named _dump and self._load on the object passed as parameter?

The _dump Method

The differences between the marshal_dump and the _dump methods are:

  • you need to handle the serialization strategy at a lower level when using the _dump method — you need to return a string that represents the data to serialize
  • the marshal_dump method takes precedence over _dump if both are defined

Let’s have a look to the following example:

User = Struct.new(:age, :fullname, :roles) do
  def _dump level
    [age, fullname].join(':')
  end
end

user = User.new(42, 'Mehdi Farsi', [:admin, :operator])

Marshal.dump(user) # => "\x04\bIu:\tUser\x1342:Mehdi Farsi\x06:\x06EF"

In the User#_dump method, we have to instantiate and return the serialization object — the string that represents your serialization.

In the following example, we define User#marshal_dump and User#_dump methods and return a string to see which method is called

User = Struct.new(:age, :fullname, :roles) do
  def marshal_dump
    'in User#marshal_dump'
  end

  def _dump level
    'in User#_dump'
  end
end

user = User.new(42, 'Mehdi Farsi', [:admin, :operator])

user_dump = Marshal.dump(user) # "in User#marshal_dump"

We can see that only the User#marshal_dump is called even though they’re both defined.

The self._load Method

Now, let's look at the marshal_load and _load methods.

The differences between the marshal_load and the _load methods are:

  • You need to handle the deserialization strategy at a lower level when using the _load method — You are in charge of instantiating the original object.
  • The marshal_load method takes a deserialized object as an argument when the _self.load method takes the serialized string as an argument.
  • The marshal_load method is an instance method when the self._load is a class method.

Let’s take a look at the following example:

User = Struct.new(:age, :fullname, :roles) do
  def _dump level
    [age, fullname].join(':')
  end

  def self._load serialized_user
    user_info = serialized_user.split(':')
    new(*user_info, Array.new)
  end
end

user = User.new(42, 'Mehdi Farsi', [:admin, :operator])

user_dump = Marshal.dump(user)
user_dump # => "\x04\bIu:\tUser\x1342:Mehdi Farsi\x06:\x06EF"

original_user = Marshal.load(user_dump)
original_user # => #<struct User age="Mehdi Farsi", fullname=42, roles=[]>

In the User._load method:

  • we deserialize the string returned by the User#_dump method
  • we instantiate a new User by passing the deserialized information

We can see that we are in charge of allocating and instantiating the object used to reconstitute our original user.

So the Marshal.load coupled to marshal_load takes care of instantiating the reconstituted original object. Then it calls the marshal_load method with the serialized object passed as argument on the freshly instantiated object.

On the contrary, a call to Marshal.load coupled to _load lets the self._load class method be in charge of:

  • deserializing the data returned by the _dump method
  • instantiating the reconstituted original object

Conclusion

Depending on your needs, you can decide to implement a higher or lower serialization/deserialization strategy. To do so, you can use the Marshal module coupled to the appropriate Marshal hook methods.

Voilà!

Guest author Mehdi Farsi is the founder of www.rubycademy.com which will offer cool courses to learn Ruby and Ruby on Rails when it launches soon.

💖 💪 🙅 🚩
farsi_mehdi
Mehdi FARSI

Posted on March 29, 2019

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related