George Gavrilchik
Posted on May 27, 2024
Introduction
In modern web development, efficiently managing data within a database is crucial for ensuring optimal application performance and data integrity. Rails provides robust tools for interacting with databases. However, there are scenarios where leveraging the power of the database itself can yield significant benefits. This is where database triggers come into play.
Database triggers are special procedures that are automatically executed in response to certain events on a particular table or view in a database. These events can include actions such as insertions, updates, or deletions. Triggers are powerful tools for enforcing rules, maintaining audit trails, and ensuring data consistency without the need for manual intervention. By encapsulating logic within the database, triggers can help offload processing from the application layer, reduce network overhead, and ensure that critical operations are performed reliably.
When to Use Them
While database triggers can be incredibly useful, their application should be carefully considered. Triggers can be highly effective for specific use cases such as data denormalization. Data denormalization involves storing redundant data to improve read performance. For example, in a denormalized database, you might maintain a summary table that aggregates data from multiple related tables. A common scenario is updating a total sales
column in a customers
table whenever a new order is inserted into the orders
table. Instead of recalculating the total sales every time it is needed, a trigger can automatically update the total whenever an order is added, modified, or deleted, thereby reducing the complexity and number of queries required to fetch this data. This use of triggers helps maintain data integrity while significantly improving query performance.
Another practical case for denormalisation using database triggers is to add a reference to another table during an insert operation to simplify queries. For instance, consider a scenario where you have a companies
table, and each company has many branches
, each branch
has many employees
, and each employee
has many sales
. If you frequently need to get aggregated sales by company, you may benefit from adding a company_id
column to the sales
table and populating it on insert.
This simplifies your queries by allowing direct access to the company information without needing to join multiple tables. For example, instead of joining the sales
, employees
, branches
, and companies
tables to get the total sales for a company, you can directly query the sales
table using the company_id
. This not only makes your queries more straightforward but also improves performance by reducing the complexity and number of joins required. This approach ensures that your data remains consistent and that the company_id
is always correctly populated, making your data aggregation tasks more efficient.
When Not to Use Them
While database triggers can offer significant advantages in certain scenarios, there are situations where their use can complicate the architecture and maintenance of an application. For example, consider a scenario where we decide to create a record in the invoices
table every time a record is created in the sales
table. Implementing this logic with a trigger would require embedding business rules directly within the database layer, which can violate the separation of concerns principle.
By placing business logic in triggers, you risk making your application harder to maintain and understand. Business logic is typically best handled within the application code, where it is more visible, easier to test, and simpler to modify as requirements evolve. When business rules are hidden within database triggers, it becomes challenging for developers to trace the flow of data and understand the application's behavior, leading to potential bugs and increased maintenance overhead. Additionally, debugging issues that arise from triggers can be more complex, as the logic is executed automatically by the database in response to events, rather than being explicitly called within the application code.
In this example, the preferred approach would be to handle the creation of invoice records within the application's service layer. This ensures that the business logic remains centralized, making the application more modular, maintainable, and easier to understand. By keeping business rules within the application code, developers can leverage standard development tools and practices, such as unit testing and version control, to ensure the integrity and reliability of the application.
Let's Start
Let's create an example trigger in the Rails environment. I suspect you have come to read this post expecting to see one. As an example, I've chosen to create a URL shortener that works the following way: For each long URL, we create a record in the database with the corresponding short alphanumeric code. For the short code, we will use the first 5 characters of the long URL's MD5 hash. If the short code is already taken by another URL, we will take the first 6 characters, and so on, until we find a free code or reach the limit of 32 characters, which is the length of an MD5 hash. By design, if the same URL is added again, it will generate the same short code, and instead of creating a new record, it will update the updated_at
field of the existing record. This approach ensures that we avoid duplicate entries and maintain the uniqueness of short codes efficiently. Now, let's dive into the details of how to implement this in a Rails application with a database trigger.
class CreateUrls < ActiveRecord::Migration[7.1]
def up
create_table :urls do |t|
t.string :long
t.string :short
t.timestamps
end
add_index :urls, :short, unique: true
add_index :urls, :long, unique: true
path = "#{Rails.root}/db/triggers/add_unique_short_to_urls_on_insert.sql"
sql = File.read(path)
ActiveRecord::Base.connection.execute sql
end
def down
drop_table :urls
ActiveRecord::Base.connection.execute 'DROP FUNCTION IF EXISTS add_unique_short'
end
end
Like any other database change, triggers are added to Rails using migrations. Here is the migration to create a table, add indices, and execute SQL code from the file we'll be storing in the triggers
folder. It is important to add a down
method to delete your trigger when you reverse the migration. I've also put the SQL code for the trigger into a separate folder. Not only does this allow us to use code highlighting and autocomplete provided by your IDE, but it also indicates to other developers (and your future self) that there are database triggers in the app, making it easy to get a full list of them. Giving them obvious names is also important to ensure clarity and ease of maintenance.
I don't think it is superfluous to clarify to folks who might not be familiar with Rails how migrations work. In Rails, we create classes for migrations in the db/migrate
folder that inherit from ActiveRecord::Migration[:rails_version]
. This is a good example of metaprogramming in Ruby. Instead of directly inheriting from the ActiveRecord::Migration
class, ActiveRecord::Migration
has redefined the []
(square brackets) method to dynamically create a class with features available for the specified Rails version and return it, so we can inherit from it. This allows us to use migrations created in previous versions of Rails even after updating Rails. The newly created class typically has two methods, up
and down
, which apply and revert the changes to the database, respectively. Alternatively, it might have a single change
method if the changes can be automatically reversed by Rails. The name of the file containing the class starts with a 14-digit timestamp, and Rails maintains a table with one column containing these timestamps for executed migrations. This ensures that migrations are run only once and in the order they were created when you type rails db:migrate
. This system ensures smooth database schema changes and consistency across different environments. It's important to mention that the SQL code we will write in the next section will be compiled and executed by the database engine, so if we need to make changes, we need to revert the migration, change the code, and execute it again. You can read more about migrations here: Rails Migrations Guide.
SQL Code
Now let's dive into the SQL code. Since I am using PostgreSQL, I will be writing code in PL/pgSQL. Here is the code:
CREATE OR REPLACE FUNCTION add_unique_short()
RETURNS TRIGGER AS $$
DECLARE
l int := 5;
hash varchar(32);
found boolean;
BEGIN
hash := MD5(NEW.long); -- Generate the hash based on the 'long' URL
-- Check for uniqueness of the short URL
LOOP
SELECT EXISTS(SELECT 1 FROM "urls" WHERE "urls"."short" = LEFT(hash, l)) INTO found;
EXIT WHEN NOT found OR l >= 32;
l := l + 1;
END LOOP;
-- Handle the case where a unique short hash is not found
IF l >= 32 AND found THEN
RAISE EXCEPTION 'Unable to generate a unique short URL';
END IF;
-- Set the short URL
NEW.short := LEFT(hash, l);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS add_unique_short_to_urls_on_insert ON urls;
CREATE TRIGGER add_unique_short_to_urls_on_insert
BEFORE INSERT OR UPDATE OF long ON urls
FOR EACH ROW
EXECUTE FUNCTION add_unique_short();
The code is straightforward, but let's highlight a few details about how PL/pgSQL works. In PL/pgSQL, all variables must be declared in a specific section at the beginning of the function, and the language is strictly typed, requiring each variable to have a defined data type. Additionally, functions must be declared before they can be used in triggers, ensuring that the database recognizes the function and its capabilities when setting up the trigger.
Since this logic operates at the data layer, making queries within the loop is not as expensive as it would be if performed from Rails. This is particularly efficient because we have an index for the short column. The database might not need to scan the entire table for the EXISTS query; it can simply check the index tree, which significantly reduces the overhead and improves performance.
Alternatives and Debugging
While PL/pgSQL is a powerful language for writing stored procedures and triggers in PostgreSQL, other database engines use different languages for similar tasks. For example, MySQL uses a language called SQL/PSM (Persistent Stored Modules), and Oracle uses PL/SQL (Procedural Language/SQL). Microsoft SQL Server employs Transact-SQL (T-SQL), which extends SQL with procedural programming capabilities. Each of these languages has its own syntax and features tailored to the specific database engine, allowing developers to write complex logic directly within the database.
For debugging and testing SQL code, it's often more efficient to use a database management application like DBeaver. These tools provide a graphical interface where you can execute SQL code directly, inspect the results, and interact with your database more intuitively. This can be particularly useful when working with stored procedures and triggers, as you can immediately see the impact of your code and make adjustments as needed.
Creating a URL in Rails
Now let's try to create a URL. Assuming the Url
model is already created, we expect that calling Url.create(long: 'https://dev.to')
will return a Url
object with both long
and short
attributes populated. However, by default, this won't happen because Rails expects that after a record is created, only the ID and timestamps can change, so it doesn't update other attributes. To make this work, I will redefine the save
method to ensure it returns the short
attribute as well. Additionally, we need to handle the case when the long URL already exists in the database. By creating a unique index in the migration, we have essentially created another trigger that stops the insert operation if the value is not unique. In Rails, I used the upsert
method to handle this scenario and specified what to update if a duplicate is found. This ensures that if the long URL already exists, the updated_at
timestamp is updated instead of creating a new record.
require 'uri'
class Url < ApplicationRecord
before_validation :set_timestamps
validate :long_is_a_valid_url
def save(*_args)
return false unless valid?
result = self.class.upsert(attributes.except(*%w[id short]),
returning: %i[id short created_at updated_at],
unique_by: :long,
record_timestamps: false,
update_only: [:updated_at])
update_attributes_from_upsert_result(result)
rescue StandardError => e
errors.add(:base, e.message)
false
end
private
def update_attributes_from_upsert_result(result)
return false unless result&.first
assign_attributes(result.first)
true
end
def set_timestamps
self.updated_at = Time.now
self.created_at ||= Time.now
end
def long_is_a_valid_url
uri = URI.parse(long)
uri.is_a?(URI::HTTPS) && uri.host.present? || raise
rescue StandardError
errors.add(:long, 'Should be a valid and secure URL')
end
end
Conclusion
Now, everything should be working smoothly. You can find a working example here, and the source code is available in this GitHub Repository. If you have any suggestions or feedback, please feel free to leave a comment.
Posted on May 27, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.