Working easier with legacy databases

This will be a short article, recently at work I had to build few reports on top of two legacy databases, to be honest, the whole architecture and the way the data is stored, in my opinion, is not optimal, a lot of things need to change but, as usual, features and business take precedence over technical debt.

The whole stack is based on nodejs on top of typescript, which makes a bit more enjoyable the fact that I'm now working in JavaScript. All the databases were generated using knexjs so, these databases were there already and I only had to connect to them.

I didn't want to spend too much time on the connection and mapping part, so I wanted to see if an ORM would be of use here

The overall idea

Since we will be integrating with other internal tools to generate these reports the implementation I came up with is temporary, so I wanted to make as few changes as possible in the current codebase for several reasons:

Once you add something temporary, it will be difficult to clean it up after it's not needed anymore, the chances of someone building on top of it is high.
I don't like working with typescript, or JavaScript in general so if I could get away with writing in some other language it could be more fun.
The current application is a monolith and the logic is complex, adding more would take longer than implementing something from scratch.

So, I decided to implement the report generating logic in a Lambda function which will be executed periodically, this meant I had to setup the connections to the databases from scratch and map to my domain classes and types in order to have a clean architecture.

Enter Sqlalchemy

Sqlalchemy is the database Swiss army knife in python, it makes it very easy to work with databases by mapping tables to domain classes, but it also works the other way around by introspecting the database's schema metadata and generating stub classes to deal with them in Python.

Automap

Sqlalchemy has an Automap extension, which lets you connect to an existing database and generate a domain model from the existing database schema and operate with it from Python, trigger queries and use Sqlalchemy normally as if the schema was generated by you.

It's very easy and it doesn't take 30 lines of code

import os

from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
from sqlalchemy import create_engine


DATABASE_URL = os.getenv('DB_URL')

Base = automap_base()
engine = create_engine(DATABASE_URL)
session = Session(bind=engine)

Base.prepare(engine, reflect=True)

Then you can access the classes with the table name as it is in the database let's say you have a user_profile database, then you can access it as

Base.classes.user_profile

Trigger queries as

UserProfile = Base.classes.user_profile

profile = session.query(
    UserProfile
).filter(
    UserProfile.user_id == 123123
).fist()

And you're good to go.

Of course, code conventions in SQL and Python are different, if you want your code to look more pythonic and have Base.classes.* to be in PascalCase as well, you can write a function to override the table naming when the automap is performed, it will add few lines but still below 20 lines in general

It looks something like this

def snake_to_pascal(base, tablename, table):
    return ''.join([w.capitalize() for w in tablename.split('_')])

Of course, that assumes your tables are named as this_is_a_table and will be converted to ThisIsATable

You can use that function when you call Base.prepare as follows

Base.prepare(
    engine,
    reflect=True,
    classname_for_table=snake_to_pascal
)

And that's it. :-)

If you want to know more about SqlAlchemy, you can always refer to the official docs: https://www.sqlalchemy.org/library.html

Blog

Working easier with legacy databases

Israel Fermín M.

The overall idea

Enter Sqlalchemy

Automap

Join Our Newsletter. No Spam, Only the good stuff.

Related