Pavel Loginov
Posted on October 30, 2024
In this article, I’ll briefly go over some best practices that help keep projects organized, simplify database maintenance, and prevent common pitfalls when working with Alembic and SQLAlchemy. These techniques have saved me from trouble more than once. Here’s what we’ll cover:
- Naming Conventions
- Sorting Migrations by Date
- Table, Column, and Migration Comments
- Data Handling in Migrations without Models
- Migration Testing (Stairway Test)
- Service for Running Migrations
- Using Mixins for Models
1. Naming Conventions
SQLAlchemy allows you to set up a naming convention that’s automatically applied to all tables and constraints when generating migrations. This saves you from manually naming indexes, foreign keys, and other constraints, which makes the database structure predictable and consistent.
To set this up in a new project, add a convention to the base class so that Alembic will automatically use the desired naming format. Here’s an example of a convention that works well in most cases:
from sqlalchemy import MetaData
from sqlalchemy.orm import DeclarativeBase
convention = {
'all_column_names': lambda constraint, table: '_'.join(
[column.name for column in constraint.columns.values()]
),
'ix': 'ix__%(table_name)s__%(all_column_names)s',
'uq': 'uq__%(table_name)s__%(all_column_names)s',
'ck': 'ck__%(table_name)s__%(constraint_name)s',
'fk': 'fk__%(table_name)s__%(all_column_names)s__%(referred_table_name)s',
'pk': 'pk__%(table_name)s',
}
class BaseModel(DeclarativeBase):
metadata = MetaData(naming_convention=convention)
2. Sorting Migrations by Date
Alembic migration filenames typically start with a revision tag, which can make the order of migrations in the directory appear random. Sometimes it’s useful to keep them sorted chronologically.
Alembic allows customizing the migration filename template in the alembic.ini
file with the file_template
setting. Here are two convenient naming formats for keeping migrations organized:
- Based on date:
file_template = %%(year)d-%%(month).2d-%%(day).2d_%%(rev)s_%%(slug)s
- Based on Unix timestamp:
file_template = %%(epoch)d_%%(rev)s_%%(slug)s
Using date or Unix timestamps in filenames keeps migrations organized, making navigation easier. I prefer using Unix timestamps, and an example will be provided in the next section.
3. Comments for Tables and Migrations
For those working in a team, commenting attributes is a good practice. With SQLAlchemy models, consider adding comments directly to columns and tables instead of relying on docstrings. This way, comments are available both in the code and the database, making it easier for DBAs or analysts to understand table and field purposes.
class Event(BaseModel):
__table_args__ = {'comment': 'System (service) event'}
id: Mapped[uuid.UUID] = mapped_column(
UUID(as_uuid=True),
primary_key=True,
comment='Event ID - PK',
)
service_id: Mapped[int] = mapped_column(
sa.Integer,
sa.ForeignKey(
f'{IntegrationServiceModel.__tablename__}.id',
ondelete='CASCADE',
),
nullable=False,
comment='FK to integration service that owns the event',
)
name: Mapped[str] = mapped_column(
sa.String(256), nullable=False, comment='Event name'
)
It’s also helpful to add comments to migrations to make them easier to find in the file system. A comment can be added with -m <comment>
when generating the migration. The comment will appear in the docstring and the filename. This naming makes it much easier to locate the required migration.
1728372261_c0a05e0cd317_add_integration_service.py
1728372272_a1b4c9df789d_add_user.py
1728372283_f32d57aa1234_update_order_status.py
1728372294_9c8e7ab45e11_create_payment.py
1728372305_bef657cd9342_remove_old_column_from_users.py
4. Avoid Using Models in Migrations
Models are often used for data manipulations, such as transferring data from one table to another or modifying column values. However, using ORM models in migrations can lead to issues if the model changes after the migration is created. In such cases, a migration based on the old model will break when executed, as the database schema may no longer match the current model.
Migrations should be static and independent of the current state of models to ensure correct execution regardless of code changes. Below are two ways to avoid using models for data manipulations.
- Use raw SQL for data manipulation:
def upgrade():
op.execute(
"UPDATE user_account SET email = CONCAT(username, '@example.com') WHERE email IS NULL;"
)
def downgrade():
op.execute(
"UPDATE user_account SET email = NULL WHERE email LIKE '%@example.com';"
)
- Define Tables Directly in the Migration: If you want to use SQLAlchemy for data manipulations, you can manually define tables directly in the migration. This ensures a static schema at the time of migration execution and will not depend on changes in the models.
from sqlalchemy import table, column, String
def upgrade():
# Define the user_account table to work with data
user_account = table(
'user_account',
column('id'),
column('username', String),
column('email', String)
)
# Get a connection to the database
conn = op.get_bind()
# Select all users without an email
users = conn.execute(
user_account.select().where(user_account.c.email == None)
)
# Update email for each user
for user in users:
conn.execute(
user_account.update().where(
user_account.c.id == user.id
).values(
email=f"{@example.com">user.username}@example.com"
)
)
def downgrade():
user_account = table(
'user_account',
column('id'),
column('email', String)
)
conn = op.get_bind()
# Remove email for users added in the upgrade
conn.execute(
user_account.update().where(
user_account.c.email.like('%@example.com')
).values(email=None)
)
5. Stairway Test for Migration Testing
The Stairway Test involves progressively testing upgrade/downgrade
migrations step-by-step to ensure the entire migration chain works correctly. This ensures each migration can successfully create a new database from scratch and downgrade without issues. Adding this test to CI is invaluable for teams, saving time and frustration.
Integrating the test into your project can be done easily and quickly. You can find a code example in this repository. It also includes other valuable migration tests that may be helpful.
6. Migration Service
A separate service for performing migrations. This is just one way to carry out migrations. When developing locally or in environments similar to development, this method fits in well. I’d like to remind you about the conditional depends_on
feature, which is relevant here. We take the application image with Alembic and run it in a separate container. We add a dependency on the database with the condition that migrations start only when the database is ready to handle requests (service_healthy
). Additionally, a conditional depends_on
(service_completed_successfully
) can be added for the application, ensuring it starts only after migrations have completed successfully.
db:
image: postgres:15
...
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
interval: 10s
start_period: 10s
app_migrations:
image: <app-image>
command: [
"python",
"-m",
"alembic",
"-c",
"<path>/alembic.ini",
"upgrade",
"head"
]
depends_on:
db:
condition: service_healthy
app:
...
depends_on:
app_migrations:
condition: service_completed_successfully
The depends_on
condition ensures migrations run only after the database is fully ready and that the application starts after migrations are completed.
7. Mixins for Models
While this may be an obvious point, it’s important not to overlook it. Using mixins is a convenient way to avoid code duplication. Mixins are classes that contain frequently used fields and methods, which can be integrated into any models where they're needed. For instance, we often need created_at
and updated_at
fields to track the creation and update times of records. It can also be useful to use an id
based on UUID to standardize primary keys. All of this can be encapsulated in mixins.
import uuid
from sqlalchemy import Column, DateTime, func
from sqlalchemy.dialects.postgresql import UUID
class TimestampMixin:
created_at = Column(
DateTime,
server_default=func.now(),
nullable=False,
comment="Record creation time"
)
updated_at = Column(
DateTime,
onupdate=func.now(),
nullable=True,
comment="Unique record identifier"
)
class UUIDPrimaryKeyMixin:
id = Column(
UUID(as_uuid=True),
primary_key=True,
default=uuid.uuid4,
comment="Unique record identifier"
)
By adding these mixins, we can include UUID id
and timestamps in any model where needed:
class User(UUIDPrimaryKeyMixin, TimestampMixin, BaseModel):
__tablename__ = 'user'
# Other columns...
Conclusion
Handling migrations can be challenging, but following these simple practices helps keep projects well-organized and manageable. Naming conventions, date sorting, comments, and testing have saved me from chaos and helped prevent mistakes. I hope this article proves helpful — feel free to share your own migration tips in the comments!
Posted on October 30, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.