21 tips for building Rails applications
Life Christian
Posted on November 23, 2020
This list is a collection of things I wish I knew when I was starting out as a Ruby on Rails developer. While I won't go in depth, this hopefully gives a good overview of things to look out for and some useful performance tips for anyone building their applications in Rails.
This is a long list, so better get ready!
1. Use includes
and eager_load
to eliminate n + 1
queries.
Every experienced Rails developer should know this but it is still worth mentioning. When querying a list that spans across multiple tables, use includes
. For example, Product.map{ |product| product.brand.name }
gets a list of brand names that would generate SQL statements equal to the number of products in the database. To get around this, simply change it to Product.includes(:brand)
, which would tell Rails to eager load the list of brands first. While the initial query will be heavier, it is best to avoid running multiple SQL statements when you can get all the data that you need in one go.
Use the bullet gem in your development environment to help find those nasty optimization leaks. I highly recommend including this gem at the start of any Rails project to catch these problems early.
Read more about it here:
https://scoutapm.com/blog/activerecord-includes-vs-joins-vs-preload-vs-eager_load-when-and-where
2. Do batch inserts and updates
Similar to the tip above, do not loop with multiple SQL statements when creating and updating records. I had a use case where we had to insert or update thousands of records every few minutes and doing that inside a loop was just painful.
If you're on Rails 6, it has added support for doing bulk inserts by introducing these three methods: insert_all
, insert_all!
and upsert_all
. If you're using an older version of Rails, you can also use the activerecord-import gem which serves the same purpose.
3. Know the differences of common Ruby methods
While in most cases they won't make or break your application, it is still important to understand the differences of the common Ruby methods. It's the source of confusion for most Rails developers because Rails keeps on changing their behavior. 🤷♂️ Thankfully, things have stabilized since Rails 5.1.
Present vs Exists vs Any
-
present?
- Loads all the records (if not already loaded) and checks if there is at least one record present.
- This is usually very slow if the record is not yet loaded.
- Produces a
SELECT *
.
-
exists?
- Will always query the database to check if at least one record is present.
- This is the quickest for instances when you only need to check for existence, but this can also be the slowest by one whole SQL call if the record is already loaded.
- Produces a
SELECT 1 LIMIT 1
.
-
any?
- Produces a
SELECT COUNT(*)
in Rails 4.2 andSELECT 1 LIMIT 1
in Rails 5.1 and up. - Behaves like
present?
if records are already loaded. - Behaves like
exists?
if the records aren't loaded. - Still slower than
.exists?
in Rails 5.0 and below. For Rails 5.1 and up,any?
will generate the same query as.exists?
.
- Produces a
Rule of thumb:
- When only checking for existence, always use
exists?
- If the record is already loaded or when you would be using the result, use
present?
. - In Rails 4.2 and below, there is not much of a use case for
any?
if you understand when to useexists?
orpresent?
. However, for newer versions of Rails, it is generally recommended to useany?
.
Blank? vs Empty?
-
blank?
- Negation of
present?
. - Same rule applies as
present?
.
- Negation of
-
empty?
- Same as
any?
. - Produces a
SELECT COUNT(*)
in Rails 4.2 and andSELECT 1 LIMIT 1
in Rails 5.1 and up.
- Same as
Rule of thumb:
- In Rails 4.2 and below, negate the
exists?
if you want to check for existence. For newer versions, you can use.empty?
- As same with
present?
, useblank?
only if the record is already loaded.
Count vs Size vs Length
-
length
- Much like
present?
, loads the association into memory if not yet loaded.
- Much like
-
count
- Will always generate a
COUNT
SQL query. Similar behavior withexists?
.
- Will always generate a
-
size
- Like any?, behaves like
length
if the array is already loaded. Otherwise, it defers to using aCOUNT
query.
- Like any?, behaves like
Rule of thumb:
- It is generally recommended to use
size
. - If the record is already loaded or when you need to load the association in memory, use
length
.
Caveat: Since count
will always generate a SQL query, the result might not always be the same with the size
or length
. Best to double check!
Read more about it here:
https://www.speedshop.co/2019/01/10/three-activerecord-mistakes.html
5. Spend time tuning your Puma config
Puma is the default web server for rails applications. By default, Puma runs on a single worker process and uses threads to serve multiple requests. Spend some time tuning Puma to run in clustered mode with multiple workers and multiple threads. Every app is different, so do some load testing (remember to do it in a production-like environment) to find the ideal number of workers and threads.
Most people would recommend using 1 worker per CPU core and while this is a good starting point, do not set this as a hard rule. You can easily go up to three or four times the number of your CPU cores and see significant performance gains. However, this would increase CPU or RAM so have a close look at it during load testing. Rails apps in general are known to consume a lot of memory so make sure you don't run out.
No one size fits all but aim to have at least 3 workers per server or container with 3 to 5 threads. If you cannot have 3 workers because of CPU or memory constraints, consider moving to a more powerful instance or dyno type. You can also learn more about tuning your web server config by listening to this great talk by Nate Berkopec.
Also consider taking a look at Puma's v5 experimental features! I had great success in using the wait_for_less_busy_worker
and the nakoyoshi_fork
.
Read more about it here: https://www.speedshop.co/2017/10/12/appserver.html
https://github.com/puma/puma/blob/master/5.0-Upgrade.md
6. Use a fast json serializer
I would stay away from using the still popular active_model_serializer gem. Use fast_json_api serializer (originally from Netflix) or something similar. Also use the oj gem which offers faster json processing. This can make a significant difference in processing large amounts of JSON.
7. Use a CDN in front of your load balancer
If you're using AWS, point your Route53 record to a Cloudfront Distribution that is in front of your Application Load Balancer (ALB). Cloudfront would then optimize traffic from your application to go through the nearest edge location in the AWS global network. Chances are, there is an edge location that is far closer to the user than the ALB. Your mileage may vary, but I saw an 80-100ms latency improvement for simply pointing our DNS to Cloudfront, which then points to the ALB. Take note that I did not yet implement any kind of caching whatsoever.
Before you do this, make sure to customize Cloudfront's error page to point to your application's error page. You do not want your customers to see an ugly Cloudfront error. Additionally, you can also configure AWS WAF to secure your app against malicious attacks.
8. Use HTTP caching
Once you have configured having a CDN in front of your ALB, you can do an http cache through the CDN. This will be fast, since everything is cached at the edge. Be warned however, this is a bit complicated to set up and there are a lot of factors to consider. It can also be a source of headaches if you accidentally cache the wrong things. The rewards can be worth it, but proceed with caution.
Read more about it here:
https://medium.com/rubyinside/https-medium-com-wintermeyer-caching-in-ruby-on-rails-5-2-d72e1ddf848c
https://devcenter.heroku.com/articles/http-caching-ruby-rails#conditional-cache-headers
9. Use Redis for fragment caching
By default, Rails caches data in the file store. This isn't the best solution if you are running a production site with multiple web servers with each having their own cache store. Thankfully beginning with version 5, Rails has built in support for caching via Redis.
Redis stores the cache in memory and this does wonders in making any app load faster. By identifying parts of your app that do not change frequently, your site would be able to serve thousands of customers without purchasing additional CPU and memory for both your web servers and your database.
Also consider using the hiredis gem to make Redis load cached content even faster. Additionally, you can also use ElastiCache in production to reduce the burden in managing your Redis cluster.
10. Use soft deletes
I would generally advise against doing hard deletes. A hard delete removes the row from the database completely and is final. There is no "undo" unless you keep backups. Just in general, it would be helpful to know when records are deleted. Check out the popular discard gem that adds an elegant way to do soft deletes.
11. Switch to Pagy for pagination
Other popular pagination gems, Kaminari and will_paginate consume more memory than Pagy. Pagy is the new kid on the block and is generally more efficient.
12. Use jemmaloc or set MALLOC_ARENA_MAX=2
Set MALLOC_ARENA_MAX=2
and check if the memory consumption of your app improves. I was personally surprised by how this relatively simple change could reduce our app's memory footprint by around 10%.
You could also try running ruby using jemmaloc, which has been shown to reduce memory consumption. I personally do not have experience in running ruby using jemmaloc, but there are several articles that recommend the switch.
Read more about it here:
https://medium.com/rubyinside/how-we-halved-our-memory-consumption-in-rails-with-jemalloc-86afa4e54aa3
https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html
13. Use the database whenever possible
Every good Rails developer should know their way around SQL. After all, ActiveRecord is just an abstraction on top of SQL. Do not use Ruby's .map
, .select
or .group_by
to select data when you can instead use ActiveRecord. SQL was built for querying and filtering data so it is also a lot faster and more efficient than Ruby. Always use the right tool for the job.
# bad
orders.select{ |order| order.state == "pending" }
# good
orders.where(state: "pending")
For complex queries, you may have no choice but to use raw SQL by using ActiveRecord.connection.execute(sql_string)
. While raw SQL will always be faster, you are still better off using ActiveRecord as much as possible since maintaining long lines of SQL is unbearable in the long run. Only use raw SQL when ActiveRecord is preventing you from getting the job done.
14. Be careful with Rails migrations
In general, only generate migrations that are additive, and that are backwards compatible with your existing application. Also when adding indexes, always add a disable_ddl_transaction!
together with algorithm: concurrently
to prevent any unexpected downtimes when deploying to production. This is a long topic by itself so take a look at the guidelines of the strong_migrations gem. I don't usually put the gem in my app anymore but it would be helpful to read about the general rules to achieve zero downtime migrations.
15. Know your indexes
A good rule of thumb is to always index the foreign keys of your tables. You can also use the lol_dba gem to check if you have indexes missing. Aside from simple B-Tree indexes, be familiar with other types of indexes. My experience is mainly on using Postgres but I imagine it's relatively the same with other SQL engines. Here's a quick run through:
- Multi-column indexes
- Can be significantly faster than a single index so this should be considered. However, Postgres can combine two separate indexes together so performance may vary.
- The order of the index matters. From the postgres docs:
...if your workload includes a mix of queries that sometimes involve only column x, sometimes only column y, and sometimes both columns, you might choose to create two separate indexes on x and y, relying on index combination to process the queries that use both columns. You could also create a multicolumn index on (x, y). This index would typically be more efficient than index combination for queries involving both columns, but it would be almost useless for queries involving only y, so it should not be the only index. A combination of the multicolumn index and a separate index on y would serve reasonably well. For queries involving only x, the multicolumn index could be used, though it would be larger and hence slower than an index on x alone.
- Partial Indexes
- An index with a
where
clause. - Can be used to enforce a unique partial constraint.
- Example: Just to illustrate, let's say you want to limit the number of active orders per user. If for some reason, you want to enforce one active order per user at a time, you can easily do this by utilizing a partial unique index.
- An index with a
# Ensures one active order per user at a time.
# Adds index concurrently to prevent downtime.
disable_ddl_transaction!
def change
add_index :orders,
%i[user_id, active],
where "active = true",
unique: true,
algorithm: concurrently
end
- This type of index is also useful when accessing a table with a particular condition that would make the resulting filtered records significantly smaller.
-
Example: You have an orders table with states
pending
andcomplete
. Over time, orders with a complete state would grow since orders are marked as complete everyday. However, our app's queries would naturally be more interested in finding pending orders today. Since there's a huge disparity between thousands of pending orders and potentially millions of complete orders, a partial index would be beneficial to reduce the size of the index and significantly improve performance.
-
Example: You have an orders table with states
# Speeds up the pending orders query
disable_ddl_transaction!
def change
add_index :orders,
:state,
where: "state = pending",
algorithm: :concurrently
end
- GIN Index
- Commonly used to speed up
LIKE
queries. While I would generally advise against using your database to do full text search types of queries, a GIN index can be very helpful when the situation arises.
- Commonly used to speed up
disable_ddl_transaction!
def change
enable_extension "btree_gin" # Needs to be enabled
add_index :orders,
:sample_column,
using: :gin,
algorithm: :concurrently
end
Read more about it here:
https://towardsdatascience.com/how-gin-indices-can-make-your-postgres-queries-15x-faster-af7a195a3fc5
16. Default values and unique/null constraints
Always think about default values of your database columns. Should it be nil
? Or should it be 0? Being nil
should have a separate meaning from 0
or an empty string ""
value. If it doesn't matter, then it would probably be best to specify a default value and a not null
constraint. This would save you some headaches later on.
For booleans, almost always specify a default value and a not null
constraint. Nobody would want three possible values for boolean columns: true
, false
, and nil
😅
def change
add_column :order,
:active,
:boolean,
null: false,
default: false
end
17. Use ElasticSearch for your searching needs
With ElasticSearch, having a powerful production ready search engine is very doable nowadays. Use the elasticsearch_rails gem in conjunction with searchkick to manage elasticsearch's syntax and quirks. Of course, this is another tool to manage so only consider this if you have complex search use cases. It is not terribly difficult to set up, but it brings another layer of complexity to your app. You may also opt to use a managed service like AWS ElasticSearch to reduce the burden of maintaining the ElasticSearch cluster.
For something simpler, you can always use the popular ransack gem. The gem uses the database to do the heavy lifting so only use this for relatively simple searches. Be careful when you want to use pattern matching (LIKE
operator). There was one time we accidentally confused the _eq
ransack predicate from _matches
, making us wonder why our queries were running so slow. 😅
18. Consider using count estimates
When counting thousands of records, consider using Postgres count estimates when you do not need an exact count. The count estimate query would be extremely fast and is usually close enough.
19. Make sense of your queries by using EXPLAIN AND ANALYZE
Understand how your query performs its actions by using EXPLAIN AND ANALYZE
. Unfortunately, ActiveRecord does not have native support but you can opt to use the activerecord-explain-analyze gem. The EXPLAIN AND ANALYZE
result can be daunting to understand with complex queries. Use this nifty tool from depesz to make sense of your explain statements.
20. Get database insights by running a few queries
There are SQL queries that you can run to have a much better understanding of the current state of your database. Take a look at the different SQL commands in rails_pg_extras gem. Run a bunch of rake tasks and gain understanding of your index_usage, locks, outliers, and more.
21. Use pg_repack to reduce table/indexes bloat
Do you have a database that has been running for years? Does it feel like queries are significantly getting slower and slower? Is it against a table that has rapid updates and deletes? Run rake pg_extras:bloat
from the rails_pg_extras
gem and check if your tables or indexes are suffering from bloat.
Bloat grows when performing thousands of updates or deletes to our database. Internally, Postgres does not actually delete a row when we delete a record, it is only marked as unavailable to all future transactions. I am oversimplifying this but essentially, the row still exists on disk but is just no longer visible. You can imagine that this would grow over time, especially for large databases. This is bloat that slows down your queries.
Running a VACUUM full
against a table will remove bloat completely. However, this command acquires an EXCLUSIVE LOCK
to your table, which means no reads or writes during the duration of the command. Yikes. This would cause downtime to any production environment. Instead, install pg_repack, a tool that would allow you to completely eliminate bloat without acquiring an exclusive lock.
Read more about it here:
https://medium.com/compass-true-north/dealing-with-significant-postgres-database-bloat-what-are-your-options-a6c1814a03a5
Wrapping up
Thats it! That was quite a list, congratulations for reaching the end. I am sure there are still things that I did not cover, but then again there will never be a complete list. Let me know if you have any more tips in the comment section! Also, for anyone who is just starting their web development journey, don't feel overwhelmed! Take things in one at a time and you'll be done before you know it. ✨
Posted on November 23, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.