Understanding SQL Joins and Grouping using Set Theory

samandar_komilov

Samandar Komilov

Posted on July 13, 2024

Understanding SQL Joins and Grouping using Set Theory

🏁 In the world of databases, SQL joins are fundamental operations that allow us to combine data from multiple tables based on related columns. While SQL joins might seem daunting at first, they are conceptually rooted in Set Theory — a branch of mathematics that deals with collections of objects and operations on them.

Types of Joins in SQL

This post aims to demystify SQL joins by exploring their relationship with set theory concepts:

  1. Understanding Tables as Sets: In SQL, each table can be seen as a set of rows (records). Just as in set theory, these rows are unique unless explicitly defined otherwise.

  2. Types of Joins: SQL offers several types of joins—such as inner joins, left joins, right joins, and outer joins—each serving a specific purpose in combining data from multiple tables.

  3. Set Operations Analogies: We'll delve into how SQL joins correspond to set operations. For instance, an inner join can be likened to the intersection of two sets, where only the common elements (rows) are retained.

  4. Visualizing Joins: Using practical examples and diagrams, we'll visualize how data from different tables intersects or combines based on join conditions.

By the end of this post, you'll have a clearer understanding of SQL joins not just as technical database operations, but as logical extensions of set theory principles.

Understanding Tables as Sets

Each table consists of some number of rows and columns, we know. In each row, there are a number of elements equal to the number of columns.

Sample table

For instance, in this table, there are 4 rows with 3 elements each (3 columns). Each row is considered as a tuple in set theory and the table is a set of tuples of elements:
K = {(1, Polat, 45), (2, Memati, 40), (3, Abdulhey, 39), (4, Cahit, 35)}

So, I hope we got the point. Now, let's move onto the JOINS.

Types of Joins

There are various sources that tell distinct information about the exact number of Joins. We'll consider the following Join types:

  1. INNER JOIN
  2. LEFT (OUTER) JOIN
  3. RIGHT (OUTER) JOIN
  4. FULL (OUTER) JOIN
  5. CROSS JOIN (Cartesian Product)
  6. ANTI LEFT JOIN
  7. ANTI RIGHT JOIN
  8. ANTI OUTER JOIN

INNER JOIN

An inner join returns only the rows (tuples) that have matching values in both tables (sets).

Inner Join

In set theory terms, it performs an intersection of two sets. Let's say, we are inner joining A and B tables, then:

Inner Join = A ∩ B
Enter fullscreen mode Exit fullscreen mode

It would be clearer if we consider elementwise, yeah? So, say we have these sets:

A = {(1, 'Pusat'), (2, 'Zaza'), (3, 'Murad')}
B = {(1, 'HR'), (2, 'IT'), (4, 'Sales')}
Enter fullscreen mode Exit fullscreen mode

Then the inner join on the 1st element (employeeID) of these 2 sets will be:

A ∩ B = {(1, 'Pusat', 'HR'), (2, 'Zaza', 'IT')}
Enter fullscreen mode Exit fullscreen mode

But 'Murad' and 'Sales' have different IDs, so they are not present in inner join result.

LEFT (OUTER) JOIN

A left outer join returns all the rows from the left table (set), along with the matching rows from the right table (set). If there is no match, the result is NULL from the right table.

Left Outer Join

In set theory, it can be thought of as including all elements from the left set and the intersection with the right set:

Left Outer Join = A ∪ (A ∩ B)
Enter fullscreen mode Exit fullscreen mode

In terms of the above example sets, the resulting left outer join set will be like this:

A ∪ (A ∩ B) = {(1, 'Pusat', 'HR'), (2, 'Zaza', 'IT'), (3, 'Murad', NULL)}
Enter fullscreen mode Exit fullscreen mode

Because, the result is all the matching rows (tuples) plus the other tuples on the left set. Due to the absence of Department for Murad, NULL will be saved.

RIGHT (OUTER) JOIN

A right outer join returns all the rows from the right table (set), along with the matching rows from the left table (set). If there is no match, the result is NULL from the left table. This is similar to the left join but includes all elements from the right set.

Right Outer Join

In set theory, it can be thought of as including all elements from the right set and the intersection with the left set:

Right Outer Join = B ∪ (A ∩ B)
Enter fullscreen mode Exit fullscreen mode

In terms of the above example sets, the resulting right outer join set will be like this:

B ∪ (A ∩ B) = {(1, 'Pusat', 'HR'), (2, 'Zaza', 'IT'), (4, NULL, 'Sales')}
Enter fullscreen mode Exit fullscreen mode

Because, the result is all the matching rows (tuples) plus the other tuples on the right set. Due to the absence of Name for Sales, NULL will be saved.

FULL (OUTER) JOIN

A full outer join returns all the rows when there is a match in either table (set). If there is no match, the result is NULL on the side that does not have a match.

Full Outer Join

In set theory, it is the union of both sets, including their intersection:

Full Outer Join = (A ∪ B) ∪ (A ∩ B)
Enter fullscreen mode Exit fullscreen mode

In terms of the above example sets, the resulting right outer join set will be like this:

B ∪ (A ∩ B) = {(1, 'Pusat', 'HR'), (2, 'Zaza', 'IT'), (3, 'Murad', NULL), (4, NULL, 'Sales')}
Enter fullscreen mode Exit fullscreen mode

This is like the sum of 2 above cases, but the matching rows participate only once in the result set.

CROSS JOIN (Cartesian Product)

A cross join produces the Cartesian product of two tables, meaning it combines each row from the first table with every row from the second table.

Cross Join

In set theory, there is a cartesian product operation: A × B.
In terms of the above example sets, the result becomes:

A × B = {(1, 'Pusat', 1, 'HR'), (1, 'Pusat', 2, 'IT'), ...}
Enter fullscreen mode Exit fullscreen mode

The list is too long (3×3 elements), hence I left only a few of them, but you got the point.

ANTI LEFT JOIN

Returns rows from the left table (A) that do not have matching rows in the right table (B).

Anti Left Join

In set theory, this operation is similar to finding elements in set A that are not in set B, denoted as A ∖ B. The result according to the example sets above:

A ∖ B = {(3, 'Murad')}
Enter fullscreen mode Exit fullscreen mode

ANTI RIGHT JOIN

Returns rows from the right table (B) that do not have matching rows in the left table (A).

Anti Right Join

In set theory, this operation is similar to finding elements in set B that are not in set A, denoted as B ∖ A. The result according to the example sets above:

B \ A = {(4, 'Sales')}
Enter fullscreen mode Exit fullscreen mode

ANTI FULL JOIN

Returns rows that do not have matching rows in either table (A or B).

Anti Full Join

In set theory, this operation corresponds to the symmetric difference of sets A and B, denoted as A Δ B. The result according to the example sets above:

A Δ B = {(3, 'Murad', NULL), (4, NULL, 'Sales')}
Enter fullscreen mode Exit fullscreen mode

Enough with Joins, now let's start considering Group By in terms of set theory and with examples.

GROUP BY

Sooon...


🏁 As everything in databases are constructed using mathematical concepts under the hood, it is so essential to understand the database operations using set theory operations and imagine how it is working. To study Set Theory, which is a branch of Discrete Mathematics, I recommend taking this MIT course:

https://openlearninglibrary.mit.edu/courses/course-v1:OCW+6.042J+2T2019/course/

Today, I tried to help giving the idea as I can, thanks for your attention!

💖 💪 🙅 🚩
samandar_komilov
Samandar Komilov

Posted on July 13, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related