Understanding SQL Dialects
Arctype Team
Posted on November 17, 2021
There is no SQL standard. Well, there is but no database fully implements it. Become familiar with the wide variety of SQL dialects to get the most out of your queries on each system.
What's a SQL dialect?
SQL has a long history, filled with anecdotes. Before we learn about dialects we need to understand how SQL itself evolved. If you're not too familiar with this history, you might want to check out the detailed article below that explains how SQL came to be and its significance in data processing:
SQL is 50 Years Old. What Can We Learn From Its Success?
Derek Xiao for Arctype ・ Jan 18 ・ 11 min read
The word "dialect" means a variant of a language. In the real world, there are around 160 dialects of the English language. Just as human languages evolved over time, SQL has also changed over the year, spawning its own dialects.
Dialects exist for their own special purposes. For example, the PostgreSQL dialect PL/pgSQL
was created to incorporate PostgreSQL-specific syntax, which might not make sense in other database systems. Regardless of these dialects, many databases have common syntax due to the ANSI SQL
specification. Similar to JDBC, REST, and many others, ANSI SQL
was created so that vendors (both open source and commercial) can co-operate in a meaningful way.
Why are SQL dialects important to understand?
Speaking from experience, it is rare to write SQL that will perfectly work across many database platforms. There will be variations that exist among different databases. This makes it all the more important to learn dialects of one or more database systems, just like how learning more than one programming language helps us become better developers.
Structure of an SQL query
The word "structured" is pretty evident in the name Structured Query Language
itself. The structured nature of SQL is the reason it became so popular and widely adopted. SQL can be roughly split into sections as shown below:
Data Query Language.
- SQL queries that fetch data from tables.
Data Definition Language.
- SQL queries that alter the structure of tables.
Data Manipulation Language.
- SQL queries that alter data present in tables.
Data Control Language.
- SQL to Grant/Revoke user permissions.
Transaction Control Language.
- SQL statements to control transactions.
A typical SQL DQL query consists of SELECT
, FROM
and a WHERE
clause. There are optional ORDER BY
, JOIN
, LIMIT
and GROUP BY
clauses as well.
Most queries, even in NoSQL database systems, more or less follow the above structure. In addition to this, some database systems have something called "procedural SQL," which is a lot more powerful. When we are talking about dialects it means we are talking about the SQL features as a whole, which will include SQL + Custom Functions + Procedural SQL.
Aside: From a language perspective, SQL has a fairly simple grammar when compared to programming languages. Nevertheless, you might want to do some more analysis on a query from time to time. If you are interested in parsing and understanding SQL queries, check out the ANTLR tool.
Different dialects in SQL and examples
Terms we generally know already, like SELECT
, WHERE
, and ORDER BY
, are just the basics that most software developers use in day-to-day development. Let's take a look at how some of the most famous dialects handle these basics and some custom examples around them.
PL/Pg SQL
PL/Pg SQL is the SQL dialect that the PostgreSQL database system uses. Unlike normal SQL, PL/PgSQL is Turing complete, i.e it has the power of a full programming language. A basic structure of a PL/Pg SQL module will look like the example below:
Most procedural code will be declared as functions and can be called from regular SQL queries. Here is an example that fetches a user id from the username, taken from the documentation itself:
This function can then be called as follow:
This is just one simple example. There is a world of possibilities that exists with these powerful functions. The full documentation/guide can be found here.
TSQL
TSQL
or Transact SQL is an SQL dialect for the Microsoft SQL Server database. Instead of looking at a function example as we did for PostgreSQL, we will take an example for a normal DQL query.
Let's assume that we have a table called sales
and we want to get the top 10 sales by amount
. A normal SQL query for it would look like -
This same query can be simplified a lot by using TSQL
syntax.
We might think this is a small improvement but we can add more complexity to this by selecting the top 2% of the rows.
You can explore more about the TOP
query and other proprietary syntax offered by SQL Server in the documentation. TSQL
is also Turing Complete similar to PL/Pg SQL
. There is even an example of a language interpreter being completely written in TSQL
, although the language itself is not as complicated as a full-blown programming language.
PL/SQL
PL/SQL
is Oracle's proprietary database procedural language. It works very similar to other procedural languages. A base structure for this is as follows,
Let's take a look at an example where this procedure calculates users that have salaries higher than 10000
.
We are declaring the variables and then selecting the values of them from the users
table. This can then be wrapped into a function and then called in a manner similar to our PL/Pg SQL
example. PL/SQL
also qualifies as a Turing complete language and is popularly used in financial institutions, where entire business applications are written on top of it.
These are real examples, but you can syntactical differences in this great answer by Prolay Chaudhury on Quora. Now that you have a flavor of the different dialects, let's compare and contrast them.
Which SQL dialect should I learn? Consider the following...
The major differences and similarities between these dialects are listed below.
Licensing
PostgreSQL is open source, while two are proprietary. This can be a deciding factor when deciding on database systems. The majority of open-source software has extensive documentation and a broad user base, while proprietary software has better features and support. With that said, even open-source databases like PostgreSQL have enterprise support and features.
Developer ecosystem
PL/Pg SQL tends to be polyglot in nature, i.e its users live in various programming language communities. On the other hand, TSQL is heavily used by the Microsoft developer community and, similarly, PL/SQL developers typically inhabit the Oracle developer community. These considerations tend to affect which dialect developers choose.
Feature set
Many of these dialects have been in the industry for a long time. While syntax and user experience are subjective, the feature sets between these dialects/databases tend to overlap a lot. If you are deciding on this factor alone, then it is very hard to distinguish between the different dialects, since most support the vast majority of use cases.
Performance
There are no explicit performance differences between the dialects, but they can vary depending upon how optimized our code is and the amount of data. Most of these functions get compiled to native C
code and hence do not incur any cost as far as compilation is concerned.
Lists of All SQL Dialects
The examples given above are just some of the most famous ones. There are plenty of others that are used in database systems of their own. Below we give more resources compiled by database ORMs like Hibernate, SQL Alchemy, and Microsoft Entity Framework.
Since ORMs interact with the database systems and provide wrappers on top of SQL, they need to understand these dialects from the ground up.
- Here is a list of dialects that Hibernate supports.
- A list supported by SQL Alchemy.
- Finally a list of SQL dialects supported by Microsoft Entity Framework.
The references above are not exhaustive. If you are working on a database platform that is specialized, then it might have a dialect of its own. Be sure to look at the documentation of the database you are using.
A note: SQL in the big data world
When the NoSQL revolution began, a lot of developers were beating the drums on Death to SQL
. But SQL came back with even more force in the big data world. Here are some examples of the SQL dialects that exist with many big data systems.
- Apache Hive takes in a specific SQL dialect and converts it to map-reduce.
- Spark SQL, for use in Spark Big Data processing.
- CQL, for working with Cassandra.
- KSQL, for SQL on stream processing.
And a lot more. I would highly encourage readers to go through this video on SQL in the big data world by Michael Stonebraker:
SQL and its dialects are not just for relational databases. Their use extends far beyond RDBMS.
Conclusion
SQL dialects are beasts of their own. Learning an entire dialect could take years to master. But what we should keep in mind is that syntax basics are all the same. On this front, we should re-think how we learn SQL, and how it is even taught:
Reimagining the SQL Curriculum: Change How SQL is Taught
Arctype Team for Arctype ・ Mar 23 ・ 9 min read
Additionally, the Turing completeness of a language should not be taken at face value. Some dialects could even replace traditional programming languages. There are plenty of situations where you might want to write business logic in database systems simply because it is a lot easier to maintain and execute. There are slight performance advantages as well since the data never leaves the database cluster and is executed without any network overhead.
SQL is very powerful. Hopefully, this blog has made you aware of the most famous dialects that exist and why they do. As database systems evolve, so do the complexity of our queries and, in turn, we'll continue to push the boundaries of the SQL language and of syntax itself.
Posted on November 17, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.