Katie
Posted on November 8, 2018
All right ... maybe not every join. But this cheat sheet helped me score 104% on my Database 101 midterm exam, so it must be worth something.
NOTE: There's a lot of markup I can't do on dev.to. So although I've let this post cross-post in its entirety, a lot of things I say in the text don't make sense anymore without my "pay attention to this!" background highlights, cross-links within the text, table of contents, etc.
I recommend reading it here, on my original blog, instead of on dev.to, and then coming back over to dev.to to discuss!
It took a bit of work to find my old school notes, because I hadn't even put "join" in the title. Instead, I called my notes "Set Operations In SQL." And that's really what this is all about, because it doesn't matter if you use SQL, Excel VLOOKUP, Python, or a pen and paper to compare one list of records to another. Combining one "set" of things with another, as a business problem, is math. Better yet, it's fun math, not hard math! It's Venn diagrams -- the kind of math that gets you doodling like you're in elementary school.
SQL code is useful to know when the two sets of records (each "record" representing some real-world entity, like a person, an invoice, or a country) that your business problem requires you to combine both exist within a single relational database management system (RDBMS), usually simply referred to as a "database." All such systems come with the ability to extract your data using the SQL programming language.
If you're a Python data analyst, be sure to learn some SQL so you can leverage its power against the databases where your data already resides.
If you're a Salesforce analyst or sysadmin, you've probably run into report and SOQL limitations. Perhaps you solved them by exporting two tables to Excel, then performing a
VLOOKUP
operation to connect them. To level up from there, you'll probably want to explore Python, not SQL -- unless your IT department is already backing up your Salesforce org's data to a traditional database. Then you'll want to know all about SQL, because it is going to save you a lot of time!
I do a lot of work in Oracle databases, so I find Oracle-flavored SQL the fastest to type and will use it as an example. You can run Oracle SQL code online if you sign up for a free account and try their Live SQL tool.
To translate this into another "flavor" of SQL, you may need to Google the keywords I've used, together with the name of your database. For example, when my full outer join
command wouldn't work in a MySQL database, I Googled "'full outer join' mysql" and discovered that MySQL-flavored SQL doesn't include a full outer join
command. Instead, you have to get clever combining smaller set operations (it's like carefully coloring in your Venn diagram one entire circle at a time, rather than running your crayon over the whole diagram at once).
Pro tip: Bookmark this article and treat it as a reference guide.
Whether you know a little SQL or a lot, use the Table of Contents at the top of this article to read one section at a time.
- Though simple, this guide is long, so if you're learning, read about one "Join Type" per day.(Unless you have a long bus ride!)
- If you know the ropes and just need to compare commands to each other, jump straight to the command that interests you.
Our Sample Data
Table "A
" Contents (CONVENTION
)
CONVENTION
is a list of people who attended a convention and where they work.
(Yes, it's a very nosy event, asking its attendees for social security number.)
ssn | name_lf | ph | em | company |
---|---|---|---|---|
A1A1A1 | Amjit, Anush | 1111111 | 111@example.com | Apple |
B2B2B2 | Borges, Benita | 2222222 | 222@example.com | Boiron |
C3C3C3 | Combs, Cathy | 3333333 | 333@example.com | CVS |
D4D4D4 | Daher, Darweesh | 4444444 | 444@example.com | Dell |
E5E5E5 | Ellis, Ezra | 5555555 | 555@example.com | EDF |
F6F6F6 | Fulvia, Frances | 6666666 | 666@example.com | Firestone |
Table "B
" Contents (RESTAURANT
)
RESTAURANT
is a list of regular customers at a restaurant, their favorite food, and their age.
(Yes, it's a very nosy restaurant, asking its regular customers for social security number.)
- Note the "ccc," rather than "333," email address for Cathy Combs.
- Note the typo in the social security number for Frances Fulvia.
social | name_fl | phone | fav_food | age | |
---|---|---|---|---|---|
C3C3C3 | Cathy Combs | 3333333 | ccc@example.com | Carrots | 33 |
D4D4D4 | Darweesh Daher | 4444444 | 444@example.com | Doritos | 44 |
E5E5E5 | Ezra Ellis | 5555555 | 555@example.com | Endives | 55 |
FFF666 | Frances Fulvia | 6666666 | 666@example.com | Fries | 66 |
G7G7G7 | Grace Gao | 7777777 | 777@example.com | Garlic | 77 |
H8H8H8 | Helen Hopper | 8888888 | 888@example.com | Hummus | 88 |
Code To Create This Data ("DDL" & "DML")
Here's code to create CONVENTION
and RESTAURANT
tables in an Oracle database and populate them with our sample data, if you want to try it yourself.
drop table CONVENTION;
create table CONVENTION (
ssn varchar2(6) not null
, name_lf varchar2(30) not null
, ph varchar2(7) not null
, em varchar(15) not null
, company varchar(15) not null
);
insert into CONVENTION values ('A1A1A1', 'Amjit, Anush', '1111111', '111@example.com', 'Apple');
insert into CONVENTION values ('B2B2B2', 'Borges, Benita', '2222222', '222@example.com', 'Boiron');
insert into CONVENTION values ('C3C3C3', 'Combs, Cathy', '3333333', '333@example.com', 'CVS');
insert into CONVENTION values ('D4D4D4', 'Daher, Darweesh', '4444444', '444@example.com', 'Dell');
insert into CONVENTION values ('E5E5E5', 'Ellis, Ezra', '5555555', '555@example.com', 'EDF');
insert into CONVENTION values ('F6F6F6', 'Fulvia, Frances', '6666666', '666@example.com', 'Firestone');
drop table RESTAURANT;
create table RESTAURANT (
social varchar2(6) not null
, name_fl varchar2(30) not null
, phone varchar2(7) not null
, email varchar2(15) not null
, fav_food varchar2(10) not null
, age int not null
);
insert into RESTAURANT values ('C3C3C3', 'Cathy Combs', '3333333', 'ccc@example.com', 'Carrots', 33);
insert into RESTAURANT values ('D4D4D4', 'Darweesh Daher', '4444444', '444@example.com', 'Doritos', 44);
insert into RESTAURANT values ('E5E5E5', 'Ezra Ellis', '5555555', '555@example.com', 'Endives', 55);
insert into RESTAURANT values ('FFF666', 'Frances Fulvia', '6666666', '666@example.com', 'Fries', 66);
insert into RESTAURANT values ('G7G7G7', 'Grace Gao', '7777777', '777@example.com', 'Garlic', 77);
insert into RESTAURANT values ('H8H8H8', 'Helen Hopper', '8888888', '888@example.com', 'Hummus', 88);
Who overlaps and how
Let's say we're a large corporation. We're so large that we run the convention and we own the restaurant. We want to look for "overlaps" between our two customer lists an analyze them accordingly.
Actually, that's all the context I'll give about "why" I chose the queries and output I chose. They make no business sense at all. I chose them to demonstrate what the different styles of SQL query can and can't do!
But back to the business of combining our sets:
As always when comparing "people" records from two different data sets, decisions must be made about what "proves" two records represent the same "real-world person." Do you match on name? Email? Social security number (remember, they are re-used after people die)? Some combination thereof?
For our examples, we'll look at two different ways of combining our data, just to flex our code muscles.
Before reading the code, be sure to look at CONVENTION
and RESTAURANT
and take note that:
- When we join on
A.ssn=B.social
, Cathy-Ezra (3-5) match- Anush, Benita, and "Frances w/
ssn
F6F6F6" remain unique toCONVENTION
- "Frances w/
ssn
FFF666," Grace, & Helen remain unique toRESTAURANT
- Anush, Benita, and "Frances w/
- When we join on
A.ph=B.phone and A.em=B.email
, Darweesh-Frances (4-6) match
You're fully debriefed! Let's take a look at our first of 5 "join types."
Join Type: "In A
"
For this join type, we'd expect 6 rows of output (Anush-Frances -- all people from CONVENTION
).
To the extent that any of these people "also exist" in RESTAURANT
(as defined by the "matching criteria" we choose), we'd expect to see details about them from RESTAURANT
.
LEFT OUTER JOIN
SQL Query, left outer join
example 1/4
First, we'll match on social security number, so Cathy has details from "table B" but Frances doesn't.
select A.name_lf, A.company, B.name_fl, B.fav_food
from CONVENTION A
left outer join RESTAURANT B on A.ssn = B.social;
Query Output, left outer join
example 1/4
name_lf | company | name_fl | fav_food |
---|---|---|---|
Amjit, Anush | Apple | ||
Borges, Benita | Boiron | ||
Combs, Cathy | CVS | Cathy Combs | Carrots |
Daher, Darweesh | Dell | Darweesh Daher | Doritos |
Ellis, Ezra | EDF | Ezra Ellis | Endives |
Fulvia, Frances | Firestone |
SQL Query, left outer join
example 2/4
Next, we'll match on phone and email, so Frances has details from "table B" but Cathy doesn't.)
select A.name_lf, A.company, B.name_fl, B.fav_food
from CONVENTION A
left outer join RESTAURANT B on A.ph = B.phone and A.em = B.email;
Query Output, left outer join
example 2/4
name_lf | company | name_fl | fav_food |
---|---|---|---|
Amjit, Anush | Apple | ||
Borges, Benita | Boiron | ||
Combs, Cathy | CVS | ||
Daher, Darweesh | Dell | Darweesh Daher | Doritos |
Ellis, Ezra | EDF | Ezra Ellis | Endives |
Fulvia, Frances | Firestone | Frances Fulvia | Fries |
SQL Query, left outer join
example 3/4
Next, we'll try (incorrectly) to suppress name_fl
and fav_food
details for anyone younger than 50.
select A.name_lf, A.company, B.name_fl, B.fav_food
from CONVENTION A
left outer join RESTAURANT B on A.ssn = B.social
where B.age >= 50;
Query Output, left outer join
example 3/4
name_lf | company | name_fl | fav_food |
---|---|---|---|
Ellis, Ezra | EDF | Ezra Ellis | Endives |
Note
Even though "rows from B
" were optional in our join, in adding a filter on B
, we've effectively turned this "left join" into an "inner join" where our data must appear in both tables.
What we actually did was filter out the entire record of anyone who didn't have an age (because they aren't in table B
) or was younger than 50.
This is a common mistake. We'll fix it in the next example.
In math, when you subtract numbers from each other, the order in which you subtract them matters.
Similarly, it's important to think about the order in which you "join" and "filter" data as you solve your business problems with SQL.
SQL Query, left outer join
example 4/4
Now we'll actually suppress name_fl
and fav_food
details for anyone 50 or younger.
select A.name_lf, A.company, B.name_fl, B.fav_food
from CONVENTION A
left outer join (select * from RESTAURANT where age >= 50) B on A.ssn = B.social;
Query Output, left outer join
example 4/4
name_lf | company | name_fl | fav_food |
---|---|---|---|
Amjit, Anush | Apple | ||
Borges, Benita | Boiron | ||
Combs, Cathy | CVS | ||
Daher, Darweesh | Dell | ||
Ellis, Ezra | EDF | Ezra Ellis | Endives |
Fulvia, Frances | Firestone |
Note
In our corrected query, we pre-filtered RESTAURANT
by age before treating it as B
, which fixed our problem from the previous example.
That was the first "join type." Let's take a look at the second.
Join Type: "In A
But Not In B
"
For this join type, we'd expect 3 rows of output (Anush+Benita+Cathy or Anush+Benita+Frances, depending on the "matching criteria" we choose).
Table B
is, by definition, not part of the final output from your business question, so remember that all of the approaches below will only include "details" from table A
.
NOT IN
SQL Query, not in
example 1/3
not in
is great when you're only matching on one column.
select A.name_lf, A.company
from CONVENTION A
where A.ssn not in ( select B.social from RESTAURANT B );
Query Output, not in
example 1/3
name_lf | company |
---|---|
Amjit, Anush | Apple |
Borges, Benita | Boiron |
Fulvia, Frances | Firestone |
SQL Query, not in
example 2/3
not in
gets dicey when you need to match on multiple columns -- you have to concatenate multiple columns into a single column and responsibly handle blank/null values.
It is NOT PREFERRED for matching on multiple columns. Use NOT EXISTS
instead.
This query is just for show!
select A.name_lf, A.company
from CONVENTION A
where nvl(A.ph,'')||'~'||nvl(A.em,'') not in ( select nvl(B.phone,'')||'~'||nvl(B.email,'') from RESTAURANT B );
Query Output, not in
example 2/3
name_lf | company |
---|---|
Amjit, Anush | Apple |
Borges, Benita | Boiron |
Combs, Cathy | CVS |
SQL Query, not in
example 3/3
Our third not in
example isn't even a real join between tables: note that RESTAURANT
isn't even in the query.
I just wanted to point out that this is typed the same way as the not in
that you can use when adding a hand-typed value filter to a query.
select A.name_lf, A.company
from CONVENTION A
where A.ssn not in ( 'B2B2B2','XYZZY','E5E5E5' );
Query Output, not in
example 3/3
name_lf | company |
---|---|
Amjit, Anush | Apple |
Combs, Cathy | CVS |
Daher, Darweesh | Dell |
Fulvia, Frances | Firestone |
<> ALL
This is exactly like not in
, only you type <> all
instead. Every example from the section on not in
could have been written with <> all
.
SQL Query
select A.name_lf, A.company
from CONVENTION A
where A.ssn <> all ( select B.social from RESTAURANT B );
Query Output
name_lf | company |
---|---|
Amjit, Anush | Apple |
Borges, Benita | Boiron |
Fulvia, Frances | Firestone |
MINUS
With minus
, we can't output columns that aren't in our "match criteria."
That said, such limited output might suffice if our business problem is a simple "dummy check" list of "all values that appear only in A
."
SQL Query, minus
example 1/2
select A.ssn
from CONVENTION A
minus
select B.social
from RESTAURANT B;
Query Output, minus
example 1/2
ssn |
---|
A1A1A1 |
B2B2B2 |
F6F6F6 |
SQL Query, minus
example 2/2
select A.ph, A.em
from CONVENTION A
minus
select B.phone, B.email
from RESTAURANT B;
Query Output, minus
example 2/2
ph | em |
---|---|
1111111 | 111@example.com |
2222222 | 111@example.com |
3333333 | 111@example.com |
NOT EXISTS
USE THIS if you need to do "in A
, but not B
" with multi-column "matching criteria" in a complicated query!
where not exists (...)
is the most useful syntax for writing a complex query with "matching criteria" that involve a multi-column match.
This approach supports "further filters" on A
-- just include them as additional and
and or
clauses beyond the where
clause responsible for the "matching" operation.
If the rest of your filters have or
in them, be sure to use parentheses to ensure that you don't accidentally make your where not exists (...)
optional.
SQL Query
SELECT A.name_lf, A.company
FROM CONVENTION A
WHERE NOT EXISTS (
SELECT NULL
FROM RESTAURANT B
WHERE A.ph = B.phone and A.em = B.email
);
Query Output
name_lf | company |
---|---|
Amjit, Anush | Apple |
Borges, Benita | Boiron |
Combs, Cathy | CVS |
LEFT OUTER JOIN ... B.... IS NULL
I can't think of a single reason to use the "LEFT OUTER JOIN
plus B.joinColumn IS NULL
" syntax in practice. It's completely overwrought.
But ... it's conceptually related to a common approach to "In A
Or B
, But Not Both" so it's an interesting mental exercise to make sure you understand.
It's also a good "cheat" pattern to be familiar with when you run into a report-writing tool that doesn't have a NOT EXISTS
syntax, like Cognos Report Studio or Python+Pandas.
With this approach, it is vital to use the IS NULL
filter for all columns of B
that were involved in the LEFT OUTER JOIN
operation.
Otherwise, if your data set has a lot of NULL
values in it, you could get unexpected results.
This approach supports "further filters" on A
-- just include them as additional and
and or
clauses beyond the where
clause responsible for the "matching" operation.
If the rest of your filters have or
in them, be sure to use parentheses to ensure that you don't accidentally make your where ... is null
filter(s) optional.
SQL Query
select A.name_lf, A.company
from CONVENTION A
left outer join RESTAURANT B on A.ph = B.phone and A.em = B.email
where B.phone is null
and B.email is null;
Query Output
name_lf | company |
---|---|
Amjit, Anush | Apple |
Borges, Benita | Boiron |
Combs, Cathy | CVS |
Congratulations getting through the second "join type." On to the third!
Join Type: "In A
, B
, Or Both"
For this join type, we'd expect 9 rows of output (Anush-Helen, with Cathy or Frances doubled up depending on which "matching criteria" we choose).
UNION
SQL Query, union
example 1/2
select A.ssn
from CONVENTION A
union
select B.social
from RESTAURANT B;
Query Output, union
example 1/2
ssn |
---|
A1A1A1 |
B2B2B2 |
C3C3C3 |
D4D4D4 |
E5E5E5 |
F6F6F6 |
FFF666 |
G7G7G7 |
H8H8H8 |
- Do you see the doubled-up "Frances" data?
Note
As with minus
, we can't output columns that weren't in our "match criteria."
A single column of output might suffice if our business problem is a simple "dummy check" list of "all possible values from either table."
SQL Query, union
example 2/2
select A.ph, A.em
from CONVENTION A
union
select B.phone, B.email
from RESTAURANT B;
Query Output, union
example 2/2
ph | em |
---|---|
1111111 | 111@example.com |
2222222 | 222@example.com |
3333333 | 333@example.com |
3333333 | ccc@example.com |
4444444 | 444@example.com |
5555555 | 555@example.com |
6666666 | 666@example.com |
7777777 | 777@example.com |
8888888 | 888@example.com |
- Do you see the doubled-up "Cathy" data?
FULL OUTER JOIN
SQL Query, full outer join
example 1/2
select A.name_lf, A.company, B.name_fl, B.fav_food
from CONVENTION A
full outer join RESTAURANT B on A.ssn = B.social;
Query Output, full outer join
example 1/2
name_lf | company | name_fl | fav_food |
---|---|---|---|
Amjit, Anush | Apple | ||
Borges, Benita | Boiron | ||
Combs, Cathy | CVS | Cathy Combs | Carrots |
Daher, Darweesh | Dell | Darweesh Daher | Doritos |
Ellis, Ezra | EDF | Ezra Ellis | Endives |
Fulvia, Frances | Firestone | ||
Frances Fulvia | Fries | ||
Grace Gao | Garlic | ||
Helen Hopper | Hummus |
- Do you see the doubled-up "Frances" data?
SQL Query, full outer join
example 2/2
select A.name_lf, A.company, B.name_fl, B.fav_food
from CONVENTION A
full outer join RESTAURANT B on A.ph = B.phone and A.em = B.email;
Query Output, full outer join
example 2/2
name_lf | company | name_fl | fav_food |
---|---|---|---|
Amjit, Anush | Apple | ||
Borges, Benita | Boiron | ||
Combs, Cathy | CVS | ||
Daher, Darweesh | Dell | Darweesh Daher | Doritos |
Ellis, Ezra | EDF | Ezra Ellis | Endives |
Fulvia, Frances | Firestone | Frances Fulvia | Fries |
Cathy Combs | Carrots | ||
Grace Gao | Garlic | ||
Helen Hopper | Hummus |
- Do you see the doubled-up "Cathy" data?
MySQL fake FULL OUTER JOIN
This is the clever trickery I mentioned earlier to compensate for the fact that MySQL-flavored SQL doesn't have a full outer join
command.
SQL Query
select A.name_lf, A.company, B.name_fl, B.fav_food
from CONVENTION A
left outer join RESTAURANT B on A.ssn = B.social
union
select A.name_lf, A.company, B.name_fl, B.fav_food
from CONVENTION A
right outer join RESTAURANT B on A.ssn = B.social;
Query Output
name_lf | company | name_fl | fav_food |
---|---|---|---|
Amjit, Anush | Apple | ||
Borges, Benita | Boiron | ||
Combs, Cathy | CVS | Cathy Combs | Carrots |
Daher, Darweesh | Dell | Darweesh Daher | Doritos |
Ellis, Ezra | EDF | Ezra Ellis | Endives |
Fulvia, Frances | Firestone | ||
Frances Fulvia | Fries | ||
Grace Gao | Garlic | ||
Helen Hopper | Hummus |
- Do you see the doubled-up "Frances" data?
Over halfway there! Keep reading to learn about the fourth "join type."
This one has a lot of options, so it'll be a bit of a longer read.
Join Type: "In Both A
And B
"
For this join type, we'd expect 3 rows of output (Cathy-Ezra or Darweesh-Frances, depending on the "matching criteria" we choose).
IN
SQL Query, in
example 1/3
select A.name_lf, A.company
from CONVENTION A
where A.ssn in ( select B.social from RESTAURANT B );
Query Output, in
example 1/3
name_lf | company |
---|---|
Combs, Cathy | CVS |
Daher, Darweesh | Dell |
Ellis, Ezra | EDF |
Note
-
in
is great when:- you're only matching on one column, and
- you don't need to see, or filter further on, any details from
B
SQL Query, in
example 2/3
select A.name_lf, A.company
from CONVENTION A
where nvl(A.ph,'')||'~'||nvl(A.em,'') in ( select nvl(B.phone,'')||'~'||nvl(B.email,'') from RESTAURANT B );
Query Output, in
example 2/3
name_lf | company |
---|---|
Daher, Darweesh | Dell |
Ellis, Ezra | EDF |
Fulvia, Frances | Firestone |
Note
-
in
gets dicey when you need to match on multiple columns -- you have to concatenate multiple columns into a single column and responsibly handle blank/null values. It is NOT PREFERRED for matching on multiple columns. Useexists
instead. This is just for show!
SQL Query, in
example 3/3
select A.name_lf, A.company
from CONVENTION A
where A.ssn in ( 'B2B2B2','XYZZY','E5E5E5' );
Query Output, in
example 3/3
name_lf | company |
---|---|
Borges, Benita | Boiron |
Ellis, Ezra | EDF |
Note
- This isn't actually a real join between tables: note that
RESTAURANT
isn't even in the query. I just wanted to point out that this is the samein
that you can use when adding a manual value filter to a query.
= ANY
SQL Query
select A.name_lf, A.company
from CONVENTION A
where A.ssn = any ( select B.social from RESTAURANT B );
Query Output
name_lf | company |
---|---|
Combs, Cathy | CVS |
Daher, Darweesh | Dell |
Ellis, Ezra | EDF |
Note
This is exactly like in
, only you type = any
instead. All 3 examples from in
apply.
INTERSECT
SQL Query, intersect
example 1/2
select A.ssn
from CONVENTION A
intersect
select B.social
from RESTAURANT B;
Query Output, intersect
example 1/2
ssn |
---|
C3C3C3 |
D4D4D4 |
E5E5E5 |
Note
As with minus
and union
, we can't output columns that weren't in our "match criteria."
A single column of output might suffice if our business problem is a simple "dummy check" list of "all values that appear in both tables."
SQL Query, intersect
example 2/2
select A.ph, A.em
from CONVENTION A
intersect
select B.phone, B.email
from RESTAURANT B;
Query Output, intersect
example 2/2
ph | em |
---|---|
4444444 | 444@example.com |
5555555 | 555@example.com |
6666666 | 666@example.com |
EXISTS
SQL Query
SELECT A.name_lf, A.company
FROM CONVENTION A
WHERE EXISTS (
SELECT NULL
FROM RESTAURANT B
WHERE A.ph = B.phone and A.em = B.email
);
Query Output
name_lf | company |
---|---|
Daher, Darweesh | Dell |
Ellis, Ezra | EDF |
Fulvia, Frances | Firestone |
Note
-
exists
is better thanin
when your "matching criteria" involve multiple columns. - Like
in
,exists
doesn't let you see, or further filter upon, details fromB
.
INNER JOIN
SQL Query
select A.name_lf, A.company, B.name_fl, B.fav_food
from CONVENTION A
inner join RESTAURANT B on A.ph = B.phone and A.em = B.email;
Query Output
name_lf | company | name_fl | fav_food |
---|---|---|---|
Daher, Darweesh | Dell | Darweesh Daher | Doritos |
Ellis, Ezra | EDF | Ezra Ellis | Endives |
Fulvia, Frances | Firestone | Frances Fulvia | Fries |
=
{#equals}
The simple =
approach to joining tables is extraordinarily widespread and often taught first in tutorials to joining tables with SQL.
It supports "further filters" -- just include them as additional and
and or
clauses beyond the where
clause responsible for the "matching" operation.
If the rest of your filters have or
in them, be sure to use parentheses to ensure that you don't accidentally make your "join filter(s)" optional.
Avoiding such concerns is one reason I slightly prefer inner join
to the "=
" approach.
I almost always use explicit inner join
commands when I need to experiment to determine what query I want to write. It helps me avoid careless backspacing errors while I play.
SQL Query
select A.name_lf, A.company, B.name_fl, B.fav_food
from CONVENTION A, RESTAURANT B
where A.ph = B.phone
and A.em = B.email;
Query Output
name_lf | company | name_fl | fav_food |
---|---|---|---|
Daher, Darweesh | Dell | Darweesh Daher | Doritos |
Ellis, Ezra | EDF | Ezra Ellis | Endives |
Fulvia, Frances | Firestone | Frances Fulvia | Fries |
So close you can taste it. One more to go -- below is the fifth and final "join type."
Join Type: "In A
Or B
, But Not Both"
For this join type, we'd expect 6 rows of output (Anush+Benita+Cathy1+Cathy2+Grace+Helen or Anush+Benita+Frances1+Frances2+Grace+Helen, depending on the "matching criteria" we choose).
There aren't any SQL commands dedicated to this join type, but take a close look at the Venn diagram representing it. Can you see the "bite out of the middle" of what's otherwise an "In A
, B
, Or Both" join? Your task is to write code that takes that "bite out of the middle."
(Side note: it is very difficult to make this Venn diagram as a sugar cookie that won't break when bringing treats for your Database 101 class's final exam. Making it in SQL is much easier!)
Choose your own adventure
"Cut out the middle" of your overlapping circles by designing a query according to this architecture:
- Build an "In
A
,B
, Or Both" query using your favorite approach and think of it as your newA
. - Build an "In Both
A
AndB
" query using your favorite approach and think of it as your newB
. - Stitch together your new
A
and your newB
using your favorite "InA
But Not InB
" approach.
FULL OUTER JOIN ... A.... IS NULL OR B.... IS NULL
With this approach, it is vital to use the IS NULL
filter for all columns of A
and for all columns of B
that were involved in the FULL OUTER JOIN
operation.
Otherwise, if your data set has a lot of NULL
values in it, you could get unexpected results.
When you do this, "AND
" together filters on columns from the same table, but "OR
" together columns from different tables, as in the sample code below.
This approach supports "further filters" -- just include them as additional and
and or
clauses beyond the where
clause(s) responsible for the "matching" operation.
Be sure to surround the entire ... or ...
filters responsible that "take a bite out of the middle" in their own set of parentheses, to avoid unexpected behavior.
If the rest of your filters have or
in them, use parentheses around them, too, to ensure that you don't accidentally make your "bite out of the middle" filter optional.
SQL Query
select A.name_lf, A.company, B.name_fl, B.fav_food
from CONVENTION A
full outer join RESTAURANT B on A.ph = B.phone and A.em = B.email
where
(
A.ph is null
and A.em is null
)
or
(
B.phone is null
and B.email is null
);
Query Output
name_lf | company | name_fl | fav_food |
---|---|---|---|
Amjit, Anush | Apple | ||
Borges, Benita | Boiron | ||
Combs, Cathy | CVS | ||
Cathy Combs | Carrots | ||
Grace Gao | Garlic | ||
Helen Hopper | Hummus |
You made it! Can you feel how much smarter you are yet?
More Resources
If you liked this post, other links that may interest you are:
- SQL Tutorials by Mode Analytics
- All about SQL JOINs by Helen Anderson
- How to Make the Leap from Excel to SQL by Benn Stancil
- 4 Ways to Join Only The First Row in SQL by Periscope Data
- More dangerous subleties of JOINs in SQL by Alex Petralia
- JOINs in SQL, Python, and R by Sadavath Sharma
- Python and SQL Overlap by David Wallace
Posted on November 8, 2018
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.