Erik
Posted on August 8, 2022
One of the first and most common things newcomers to SQL struggle with is how each JOIN
is different from the other. In this article, I’ll explain the differences between the inner vs outer JOIN
and left vs right JOIN
in SQL using examples from each.
Introduction to Joining Tables
In SQL, we use JOIN
in our queries to create result sets made up of multiple tables. For example, it’s common to have countries
and states
tables when capturing address information, so an address record may look something like this:
street_address
city
state_or_province
country_id
123 Main St Houston Texas 1
558 Maple AveToronto Ontario 2
Here, the numbers in the country_id
column of the addresses
table refer to the IDs of their respective country in the countries
table, which may look like this:
id
country_name
1 United States
2 Canada
Now, say we want to write a query that displays a full address, something like 123 Main St, Houston, TX United States. Since the street, city, and state names are recorded in the addresses
table, and the country name is recorded in the countries
table, we have to join the two tables to get the result set that we want. The relevant SQL would probably look something like this:
SELECT
a.street_address,
a.city,
a.state_or_province,
c.country_name
FROM
addresses a
JOIN
countries c
ON
a.country_id = c.id
The SQL code above shows us something like the following results:
street_address
city
state_or_province
country_name
123 Main St HoustonTexas United States
558 Maple Ave Toronto Ontario Canada
Notice here that we have values from the street_address
, city
, and state_or_province
columns of the addresses
table, and values from the country_name
column of the countries
table. We were able to join
these two tables together by matching the country_id
value in addresses
with the id
column of countries
.
There are a few kinds of join
s in SQL, each one behaving slightly differently. This article aims to help you understand the difference between the different kinds of joins in SQL.
Download and Build the Sample Database
If you’d like to follow along with this article, go ahead and clone the code to build the example database . To build the database in pgAdmin, first run the command in create_database.sql
then make sure to change connections to the newly created employee_database
database and run the commands in build_and_seed_tables.sql
.
You should now have a database with four tables: addresses
, countries
, departments
, and employees
. Let’s quickly discuss the data model before moving on to the actual point of this article.
The addresses
and countries
tables are just like the ones in the examples above; each address record references a country record. Then, we have an employees
table, each record of which references an address record as well as a department record. The department records are pretty simple; the departments
table simply has an id
and name
column and refers to what department an employee works.
Ok, now on to actual joins!
Right vs. Left Joins
Let’s start by talking about the difference left and right joins. It’s helpful to think about all the tables in a SQL query as a horizontal list of tables. The leftmost table is the table in the FROM
clause, the next table to the right of the leftmost table is whatever table we are joining. So for example, if we write the query
SELECT * FROM employees e JOIN addresses a ON e.address_id = a.id
Then the employees
table is the leftmost table while the addresses
table is the next one to the right, kind of like this:
If we take it a step further and join the countries
table to the addresses
, the next table to the right would be countries
. In other words, the following SQL query:
SELECT *
FROM employees e
JOIN addresses a ON e.address_id = a.id
JOIN countries c ON a.country_id = c.id
can be visualized as the following:
Notice again that whenever a table is joined to another, the already-existing table (for lack of a better word) is the left table and the joining table is the right table. So in our example above, our base table is employees
so it’s on the left; then we joined addresses
to it, so it’s to the right of employees
. Finally, we then joined countries
to addresses
, putting addresses
to the right of countries
and thus the rightmost table.
This visualization is helpful in understanding how left and right joins work. For example, query all of the employees
records and LEFT JOIN
the addresses
table and see what happens. The query:
SELECT e.first_name, a.street_address, a.city
FROM employees e
LEFT JOIN addresses a ON e.address_id = a.id
The results:
"Bob" "123 Main St" "Houston"
"Jordan" "220 C 30" "Izamal"
"Nicole" "3030 Burgos" "Xalapa"
"Erik" "1010 2nd St" "Omaha"
"Lina" "1010 2nd St" "Omaha"
"Sandra", NULL, NULL
Notice here that all records for employees
, the leftmost table, appear as results in this query, including a record with no associated addresses
record. The records in the rightmost table, addresses
, only appear if they can be joined to the leftmost table. There are records in addresses
that cannot be joined to employees
because no employees
record has a corresponding address_id
.
Conversely, if we use RIGHT JOIN
to join the addresses
table, we will see results that include all of the addresses
table–the rightmost table–despite not being able to be joined to the employees
table. For example, the query:
SELECT e.first_name, a.street_address, a.city
FROM employees e
RIGHT JOIN addresses a ON e.address_id = a.id
The results:
"Bob" "123 Main St" "Houston"
"Jordan" "220 C 30" "Izamal"
"Nicole" "3030 Burgos" "Xalapa"
"Erik" "1010 2nd St" "Omaha"
"Lina" "1010 2nd St" "Omaha"
null "558 Maple Ave" "Toronto"
null "99 Jackson Rd" "Flin Flon"
null "821 Carol" "Nuuk"
Notice this time that we have three addresses
records in the result set that have null
values in their respective addresses
column. Also notice that the employee record with no associated address record is not in this result set. This is because by doing a RIGHT JOIN
, we essentially tell the database to give us all results of the rightmost table, and join only the leftmost table’s records if they are associated with the rightmost.
Conversely, we can switch which table is right and left by referring to the addresses
table in the FROM
clause and then joining the employees
table. For example, let’s SELECT
from addresses
and LEFT JOIN
the employees
table. The query:
SELECT e.first_name, a.street_address, a.city
FROM addresses a
LEFT JOIN employees e ON e.address_id = a.id
The result:
"Bob" "123 Main St" "Houston"
"Jordan" "220 C 30" "Izamal"
"Nicole" "3030 Burgos" "Xalapa"
"Erik" "1010 2nd St" "Omaha"
"Lina" "1010 2nd St" "Omaha"
null "558 Maple Ave" "Toronto"
null "99 Jackson Rd" "Flin Flon"
null "821 Carol" "Nuuk"
The results when we LEFT JOIN
the employees
table to the addresses
table are the same as when we RIGHT JOIN
the addresses
table to the employees
table. That’s because in this case, the leftmost table is addresses
so by left joining the employees
table to it, we are telling the database to give us all the addresses
records and then join the employees
table records if they exist.
Now, let’s take it a step further and see what happens when we chain JOIN
commands together. First, let’s SELECT FROM
the addresses
table and RIGHT JOIN
the countries
table so we can see how many of the countries
records are not associated to addresses
. The query:
SELECT
a.street_address,
a.city,
a.state_or_province,
c.country_name
FROM addresses a
RIGHT JOIN countries c
ON a.country_id = c.id
The results:
"1010 2nd St" "Omaha" "NE" "United States"
"123 Main St" "Houston" "TX" "United States"
"3030 Burgos" "Xalapa" "Ver" "Mexico"
"220 C 30" "Izamal" "Yuc" "Mexico"
"99 Jackson Rd" "Flin Flon" "MB" "Canada"
"558 Maple Ave" "Toronto" "ON" "Canada"
"821 Carol" "Nuuk" "SQ" "Greenland"
NULL NULL NULL "Iceland"
As we can see from the result set of our previous query, the only country in our database that isn’t related to an address record is Iceland.
Knowing this, let’s see what happens when we SELECT FROM
the employees
table, LEFT JOIN
the addresses
table to it, and then RIGHT JOIN
the countries
table to that. Do you think we’ll see all the countries? Let’s find out! The query:
SELECT
e.first_name,
e.last_name,
a.street_address,
a.city,
a.state_or_province,
c.country_name
FROM employees e
LEFT JOIN addresses a
ON e.address_id = a.id
RIGHT JOIN countries c
ON a.country_id = c.id
The results:
"Lina" "Mazin" "1010 2nd St" "Omaha" "NE" "United States"
"Erik" "Whiting" "1010 2nd St" "Omaha" "NE" "United States"
"Bob" "Robertson" "123 Main St" "Houston" "TX" "United States"
"Nicole" "Nicholson" "3030 Burgos" "Xalapa" "Ver" "Mexico"
"Jordan" "Mays" "220 C 30" "Izamal" "Yuc" "Mexico"
NULL NULL NULL NULL NULL "Canada"
NULL NULL NULL NULL NULL "Greenland"
NULL NULL NULL NULL NULL "Iceland"
These are interesting results that show us something about the way RIGHT
and LEFT
joins behave. Notice that we only get 5 rows of results that include records from the addresses
table even though there are 7 records in that table. This is because there are only 5 addresses
that are associated to employees
records, and when we LEFT JOIN
ed the addresses
table, we told the database that we only want records from the addresses
table if they’re associated with an employee record. However, when we RIGHT JOIN
the countries
table, we tell the database we want all records from the countries
table, even if there’s no associated addresses
record in the result set.
Now that we’ve talked all about the difference between RIGHT
and LEFT
, let’s talk about INNER
and OUTER
joins.
Inner vs. Outer Joins
The difference between INNER
and OUTER
joins are very similar to the differences between LEFT
and RIGHT
joins. However, in this case, it’s more helpful to think of tables as Venn diagrams. You know, the ones where two circles meet in the middle and the overlapping section is some shared attribute between the circles?
We still want to think of tables being joined from left to right, but with INNER
joins, we have to think about how records might overlap.
NOTE: The default JOIN operation in Postgres (and every other RDBMS I've ever worked with) is the INNER JOIN. That means if you write something like SELECT * FROM A JOIN B ON A.b_id = B.id, it is implied that you mean to do an inner join and the SQL SELECT * FROM A INNER JOIN B ON A.b_id = B.id is exactly the same thing.
Let’s see how INNER JOIN
works when we join addresses
to employees
. The query:
SELECT e.first_name, a.street_address
FROM employees e
INNER JOIN addresses a
ON e.address_id = a.id
The results:
"Bob" "123 Main St"
"Jordan" "220 C 30"
"Nicole" "3030 Burgos"
"Erik" "1010 2nd St"
"Lina" "1010 2nd St"
Notice this time that we have five results, despite there being six employees
records in the database. This is unlike the LEFT JOIN
from the previous section where the employee record with no associated address record was still in the result set. In this way, INNER JOIN
means we only want results from the leftmost table if the rightmost table can be joined to them. That’s why you often see Venn diagrams used to explain joins. In this case, INNER JOIN
is this Venn diagram:
So, the main difference between LEFT
and INNER
joining that we’ve seen so far is that if you want all records of the leftmost table–whether they have associated records in the rightmost table or not–you want to use a LEFT
join. If you only want records from the leftmost table if a record from the right most table can be joined to it, you want to use INNER JOIN
.
So what about OUTER
? The OUTER JOIN
is a special case because even though it seems antimonious with INNER JOIN
, OUTER
joins need to to be specified with either RIGHT
, LEFT
, or FULL
. For example, the following SQL will result in a syntax error:
-- Doesn't work!
SELECT e.first_name
FROM employees e
OUTER JOIN addresses a
ON e.address_id = a.id
We have to specify if we want the leftmost or right most table to be outer-joined. Let’s see how LEFT OUTER JOIN
behaves. The SQL:
SELECT
e.first_name,
a.street_address
FROM employees e
LEFT OUTER JOIN addresses a
ON e.address_id = a.id
The results:
"Bob" "123 Main St"
"Jordan" "220 C 30"
"Nicole" "3030 Burgos"
"Erik" "1010 2nd St"
"Lina" "1010 2nd St"
"Sandra" NULL
Here we specified that we want the leftmost table to be outer-joined and therefore got all results from employees
regardless of whether a record from the addresses
table can be joined to it or not.
The RIGHT OUTER JOIN
on the other hand will include all records from the rightmost table regardless of whether a record from the leftmost table can be joined to it. The SQL:
SELECT
e.first_name,
a.street_address
FROM employees e
RIGHT OUTER JOIN addresses a
ON e.address_id = a.id
The results:
"Bob" "123 Main St"
"Jordan" "220 C 30"
"Nicole" "3030 Burgos"
"Erik" "1010 2nd St"
"Lina" "1010 2nd St"
NULL "558 Maple Ave"
NULL "99 Jackson Rd"
NULL "821 Carol"
See here that all the addresses
records were included in the result set, even if there was no employees
record that could be joined to it. Notice also that the value from employees
that has no associated addresses
record is not included in the result set.
Did you notice that the results for LEFT OUTER JOIN
and RIGHT OUTER JOIN
are the exact same results we get from LEFT JOIN
and RIGHT JOIN
respectively? If so, good on you! LEFT JOIN
and LEFT OUTER JOIN
are exactly the same thing; same with RIGHT JOIN
and RIGHT OUTER JOIN
. Whenever you use RIGHT
or LEFT
in your JOIN
, the OUTER
is implied and you actually don’t have to write it (though some people do because they say it adds clarity).
There’s one more kind of OUTER
join: FULL
. The FULL OUTER JOIN
will include all results of both the left and rightmost tables, regardless of whether they can be joined to each other. Check it out, the SQL:
SELECT
e.first_name,
a.street_address
FROM employees e
FULL OUTER JOIN addresses a
ON e.address_id = a.id
The results:
"Bob" "123 Main St"
"Jordan" "220 C 30"
"Nicole" "3030 Burgos"
"Erik" "1010 2nd St"
"Lina" "1010 2nd St"
"Sandra" NULL
NULL "558 Maple Ave"
NULL "99 Jackson Rd"
NULL "821 Carol"
See how this time we have all records from each table. The leftmost table, employees
, includes even the record with no associated addresses
record. Likewise, the result set includes three addresses
records with no associated employees
record. That’s because the FULL
join is both a LEFT
and RIGHT
join put together.
Conclusion
Nearly everyone struggles with the different kinds of joins when they’re first learning SQL, so let’s review what we learned:
-
LEFT
,RIGHT
, andFULL
joins are the same asLEFT OUTER
,RIGHT OUTER
, andFULL OUTER
joins-
LEFT
will include all records from the leftmost table, even if there are no records in the rightmost table that can join to them -
RIGHT
will include all records from the rightmost table, even if there are no records in the leftmost table that can join to them -
FULL
includes records from both the rightmost and leftmost tables, even if the records from one table have no joining records in the other
-
-
INNER JOIN
is the default join-
INNER
will only include records where both the leftmost and rightmost tables have associated records - Use this when you want only full results; records from one table with no associated records in the other table will not show up in the result set
-
Now that you know how to use joins, try out some of them in the example database using the departments
column. As always, feel free to tweet me at @erikwhiting4 or send me an email at erik@erikwhiting.com if you have any questions. Good luck!
Posted on August 8, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 23, 2024
September 25, 2024