Optimizing Data Storage in Ruby on Rails: A Seamless Migration from Database to AWS S3
Phil Smy
Posted on August 1, 2023
Our application has a table dedicated to storing SENT email messages. This feature enables us to present a record of sent items to users and track changes to email templates. Currently, we store all this data in our database, which accounts for the largest share of our data storage.
I recently realized that this data, which is seldom accessed, occupies a significant amount of storage and contributes to database slowdowns. It became clear that a more efficient solution was required.
After exploring multiple options, I landed on a straightforward resolution: migrate the data from the database to an S3 storage system.
Below is the execution process of this solution.
Our Existing Email Storage Class
Our application initially had a class like this:
# Table name: mail_queue_items
#
# id :integer
# body :text(16777215) <- this is where all the data goes!
# completed_at :datetime
# delivery_code :string(255)
# error :string(255)
# mail_to :string(255)
# msg_reference :string(255)
# run_at :datetime
# status :string(255)
# subject_line :string(255)
# tracked :boolean
# tracked_at :datetime
# created_at :datetime
# updated_at :datetime
class MailQueueItem < ApplicationRecord
end
The data (the body column) is stored in a compressed, encoded format in our database:
sig { params(data: String).void }
def body=(data)
if data[/^HTML/]
super(CompressedColumn.new.dump(data))
else
super(data)
end
end
sig { returns(String) }
def body
CompressedColumn.new.load(read_attribute(:body))
end
Storing in S3
This process turned out to be simpler than anticipated. Since we were already using S3 (like everyone else on the planet!), the transition was pretty straightforward.
Create a new bucket
For our needs, we didn’t require any complex setup. So we just created a new bucket using the AWS console. It’s worth noting that you might want to enable versioning for added safety in case of accidental overwrites.Connect to it
s3 = Aws::S3::Resource.new(
region: ENV.fetch("S3_REGION", nil),
credentials: Aws::Credentials.new(ENV.fetch("S3_ACCESS_KEY", nil), ENV.fetch("S3_SECRET_ACCESS_KEY", nil))
)
- Create and save the object with the data
bucket_name = "my-great-bucket-name"
file_name = "#{tenant_id}-#{mqi_id}" # This should be a unique identifier for each document
obj = s3.bucket(bucket_name).object(file_name)
# Upload the file
obj.put(body: mqi.body)
- That’s it! No step 4.
In-class Implementation
Integrating the process into our existing methods resulted in minimal changes to the calling code.
sig { params(data: String).void }
def body=(data)
if data[/^HTML/]
obj = s3_connector.bucket(ENV.fetch("MQI_BUCKET_NAME", nil)).object(s3_file_name)
compressed_encoded_data = CompressedColumn.new.dump(data)
obj.put(body: compressed_encoded_data)
else
super(data)
end
end
sig { returns(String) }
def body
download = s3_connector.bucket(ENV.fetch("MQI_BUCKET_NAME", nil)).object(s3_file_name)
compressed_encoded_data = download.get.body.read
CompressedColumn.new.load(compressed_encoded_data)
end
sig { returns(Aws::S3::Resource) }
def s3_connector
Aws::S3::Resource.new(
region: ENV.fetch("S3_REGION", nil),
credentials: Aws::Credentials.new(ENV.fetch("S3_ACCESS_KEY", nil), ENV.fetch("S3_SECRET_ACCESS_KEY", nil))
)
end
sig { returns(String) }
def s3_file_name
"#{tenant_id}-#{mqi_id}" # This should be a unique identifier for each document
end
Next Steps: Building a Migrator
It should be quite straightforward to create a migrator that will transfer the data from our current structure to S3. This development task will involve retrieving the data from the existing database, moving it to S3, and reconstructing the table in the database. By doing so, we could likely recover a significant amount of the 500GB currently occupied. This approach offers an exciting potential for optimizing our application’s storage usage and efficiency.
An added bonus with the reduced data size is the reduced cost in backups and much smaller and faster database dumps!
Reflections
The process of migrating email data from our database to S3 was a relatively simple yet highly effective solution to combat the issues of excessive data storage and potential slowdowns. This not only improved the efficiency of our database but also underscored the value of exploring straightforward solutions to complex challenges.
By implementing minor adjustments to our existing code and leveraging the capabilities of S3, we were able to establish a more streamlined and robust system for storing SENT email messages.
This strategy showcases the potential for continuous improvement and optimization within any software system. It’s always worth investigating whether there’s a more efficient way of handling large data sets in your applications!
As usual, I am writing this here mainly to cement it in my own brain. But I couldn’t find any examples of this online, so hopefully this benefits someone else!
You can find me on Twitter where I talk about Ruby on Rails, my company Zonmaster, and life in general. If you’re looking for help with your Rails project drop me a note on Twitter or LinkedIn
Posted on August 1, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.