Optimizing Data Storage in Ruby on Rails: A Seamless Migration from Database to AWS S3

philsmy

Phil Smy

Posted on August 1, 2023

Optimizing Data Storage in Ruby on Rails: A Seamless Migration from Database to AWS S3

Our application has a table dedicated to storing SENT email messages. This feature enables us to present a record of sent items to users and track changes to email templates. Currently, we store all this data in our database, which accounts for the largest share of our data storage.

I recently realized that this data, which is seldom accessed, occupies a significant amount of storage and contributes to database slowdowns. It became clear that a more efficient solution was required.

After exploring multiple options, I landed on a straightforward resolution: migrate the data from the database to an S3 storage system.

Below is the execution process of this solution.

Our Existing Email Storage Class

Our application initially had a class like this:

# Table name: mail_queue_items
#
# id :integer
# body :text(16777215) <- this is where all the data goes!
# completed_at :datetime
# delivery_code :string(255)
# error :string(255)
# mail_to :string(255)
# msg_reference :string(255)
# run_at :datetime
# status :string(255)
# subject_line :string(255)
# tracked :boolean
# tracked_at :datetime
# created_at :datetime
# updated_at :datetime
class MailQueueItem < ApplicationRecord
end
Enter fullscreen mode Exit fullscreen mode

The data (the body column) is stored in a compressed, encoded format in our database:

sig { params(data: String).void }
  def body=(data)
    if data[/^HTML/]
      super(CompressedColumn.new.dump(data))
    else
      super(data)
    end
  end

  sig { returns(String) }
  def body
    CompressedColumn.new.load(read_attribute(:body))
  end
Enter fullscreen mode Exit fullscreen mode

Storing in S3

This process turned out to be simpler than anticipated. Since we were already using S3 (like everyone else on the planet!), the transition was pretty straightforward.

  1. Create a new bucket
    For our needs, we didn’t require any complex setup. So we just created a new bucket using the AWS console. It’s worth noting that you might want to enable versioning for added safety in case of accidental overwrites.

  2. Connect to it

s3 = Aws::S3::Resource.new(
  region:      ENV.fetch("S3_REGION", nil),
  credentials: Aws::Credentials.new(ENV.fetch("S3_ACCESS_KEY", nil), ENV.fetch("S3_SECRET_ACCESS_KEY", nil))
)
Enter fullscreen mode Exit fullscreen mode
  1. Create and save the object with the data
bucket_name = "my-great-bucket-name"
file_name = "#{tenant_id}-#{mqi_id}" # This should be a unique identifier for each document
obj = s3.bucket(bucket_name).object(file_name)

# Upload the file
obj.put(body: mqi.body)
Enter fullscreen mode Exit fullscreen mode
  1. That’s it! No step 4.

In-class Implementation

Integrating the process into our existing methods resulted in minimal changes to the calling code.

sig { params(data: String).void }
def body=(data)
  if data[/^HTML/]
    obj = s3_connector.bucket(ENV.fetch("MQI_BUCKET_NAME", nil)).object(s3_file_name)
    compressed_encoded_data = CompressedColumn.new.dump(data)
    obj.put(body: compressed_encoded_data)
  else
    super(data)
  end
end

sig { returns(String) }
def body
  download = s3_connector.bucket(ENV.fetch("MQI_BUCKET_NAME", nil)).object(s3_file_name)
  compressed_encoded_data = download.get.body.read

  CompressedColumn.new.load(compressed_encoded_data)
end

sig { returns(Aws::S3::Resource) }
def s3_connector
  Aws::S3::Resource.new(
    region:      ENV.fetch("S3_REGION", nil),
    credentials: Aws::Credentials.new(ENV.fetch("S3_ACCESS_KEY", nil), ENV.fetch("S3_SECRET_ACCESS_KEY", nil))
  )
end

sig { returns(String) }
def s3_file_name
  "#{tenant_id}-#{mqi_id}" # This should be a unique identifier for each document
end
Enter fullscreen mode Exit fullscreen mode

Next Steps: Building a Migrator

It should be quite straightforward to create a migrator that will transfer the data from our current structure to S3. This development task will involve retrieving the data from the existing database, moving it to S3, and reconstructing the table in the database. By doing so, we could likely recover a significant amount of the 500GB currently occupied. This approach offers an exciting potential for optimizing our application’s storage usage and efficiency.

An added bonus with the reduced data size is the reduced cost in backups and much smaller and faster database dumps!

Reflections

The process of migrating email data from our database to S3 was a relatively simple yet highly effective solution to combat the issues of excessive data storage and potential slowdowns. This not only improved the efficiency of our database but also underscored the value of exploring straightforward solutions to complex challenges.

By implementing minor adjustments to our existing code and leveraging the capabilities of S3, we were able to establish a more streamlined and robust system for storing SENT email messages.

This strategy showcases the potential for continuous improvement and optimization within any software system. It’s always worth investigating whether there’s a more efficient way of handling large data sets in your applications!

As usual, I am writing this here mainly to cement it in my own brain. But I couldn’t find any examples of this online, so hopefully this benefits someone else!

You can find me on Twitter where I talk about Ruby on Rails, my company Zonmaster, and life in general. If you’re looking for help with your Rails project drop me a note on Twitter or LinkedIn

💖 💪 🙅 🚩
philsmy
Phil Smy

Posted on August 1, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related