Python and SQL
Tue
Posted on October 29, 2021
After working on some Javascript issues, I wanted to try a new problem on Python and luckily, in one of the issues of the project I was working on was on Python and SQL.
Taking a look at the issue:
The issue can be found here.
To solve the issue, I need to:
- learn more about Python in general
- learn about
sqlalchemy
- learn more about SQL injection
- learn how to use
sqlalchemy
to prevent SQL injection
Approaching the issue:
Before trying to solve the problem, I took a quick look at the file leads.py
to figure out its usage. It acts as an api route "/leads" that serves data or creates new data.
Moreover, what is sqlalchemy? Simply put, is a Python Object Oriented Mapper that allows developers to work with SQL, it hides and encapsulates SQL queries into objects.
SQL Injection can happen when you take user's input, put it into raw string SQL statements. For example:
query = text(
"""
SELECT
{columns}
FROM lead_tag lt
JOIN leads l on l.id = lt.lead_id
JOIN tags t on t.id = lt.tag_id
WHERE to_tsvector(l.company_name) @@ to_tsquery('{search}')
AND lt.tag_id = :tag_id
ORDER BY ts_rank(to_tsvector(company_name), '{search}')
LIMIT :limit
OFFSET :offset
""".format(
columns=",".join("l." + f for f in DEFAULT_LEAD_FIELDS),
search=search,
)
)
The piece of code uses string interpolation to add user's input into the SQL statement which is vulnerable to SQL injection, attackers can easily insert a SQL statement that can change your tables. Therefore, we need to convert this piece of code into sqlalchemy
which can turn this query into object and protect our server from injection.
Bonus: a very helpful article on SQL injection
Coming up with a solution
There are many ways to convert raw SQL statement into sqlalchemy
expressions, I tried to change as little as possible by working around with the existing code res = connection.execute(query, **query_args)
with db.get_connection() as connection:
res = connection.execute(query, **query_args)
response_body = []
count = 0
for row in res:
# TODO: handle potential errors if the user chooses a field not in the row
lead = {field: getattr(row, field) for field in include}
if drop_null:
lead = {k: v for (k, v) in lead.items() if v is not None}
response_body.append(lead)
count += 1
return {
"count": count,
"query": {
"page": page,
"perpage": perpage,
},
"leads": response_body,
}, 200
All I needed to do was to change the search
interpolation into bound parameters similar to LIMIT
and OFFSET
.
query = text(
"""
SELECT
{columns}
FROM lead_tag lt
JOIN leads l on l.id = lt.lead_id
JOIN tags t on t.id = lt.tag_id
WHERE to_tsvector(l.company_name) @@ to_tsquery(:search)
AND lt.tag_id = :tag_id
ORDER BY ts_rank(to_tsvector(company_name), :search)
LIMIT :limit
OFFSET :offset
""".format(
columns=",".join("l." + f for f in DEFAULT_LEAD_FIELDS),
)
)
query_args = {
"limit": limit,
"offset": offset,
"tag_id": tag_id,
"search": search,
}
The SQL statement will now be executed using execute
method with its arguments query_args
, if injection occurs, it will auto-escape.
Conclusion
I love this issue, it allowed me to learn new things (Python, SQL injection, ORM), in the end I felt very satisfied. Though, the issue seems easy, it took me about 2 days to let all the new information sink in before attempting the PR.
Posted on October 29, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.