The Intricacies of Null Values in MySQL

sachigoto

Sachi Goto

Posted on May 12, 2024

The Intricacies of Null Values in MySQL

Introduction:

For those who are new to MySQL, including myself, you might encounter unexpected results when you query from a table that has some null values. In MySQL, null is neither true nor false but unknown, and you can't compare null with null unless you use the 'IS NULL' or 'IS NOT NULL' operator. I found that the way MySQL treats null is not only interesting but also important to understand. Here, I'm demonstrating how null behaves using various MySQL functions!

Joining Tables with Columns That Contain Null Values

NULL with INNER JOIN vs. NULL with LEFT JOIN
Let's say there are two tables, as described below:

Table name : teachers

id dept_id name
101 1 Shrivell
102 1 Throd
103 1 Splint
104 NULL Spiregrain
105 2 Cutflower
106 NULL Deadyawn

Table name : departments

id name
1 Computing
2 Design
3 Engineering
NULL NULL

Notice that there are null values in the teachers table. What would happen if I use an INNER JOIN to combine these two tables?

SELECT teachers.name, departments.name AS department
FROM teachers INNER JOIN departments
ON (teachers.dept_id = departments.id)

Enter fullscreen mode Exit fullscreen mode

Here is the result

+-----------+------------+
| Name      | Department |
+-----------+------------+
| Shrivell  | Computing  |
| Throd     | Computing  |
| Splint    | Computing  |
| Cutflower | Design     |
+-----------+------------+
Enter fullscreen mode Exit fullscreen mode

Notice that in the results, you don't see the names of two teachers whose department_ids are null.

But what about using a LEFT JOIN? Let's take a look.
Here is the query where I simply swap 'INNER' for 'LEFT'.

SELECT teachers.name, departments.name AS department
FROM teachers LEFT JOIN departments
ON (teachers.dept_id = departments.id)

Enter fullscreen mode Exit fullscreen mode

Here is the result

+------------+------------+
| Name       | Department |
+------------+------------+
| Shrivell   | Computing  |
| Throd      | Computing  |
| Splint     | Computing  |
| Spiregrain | NULL       |
| Cutflower  | Design     |
| Deadyawn   | NULL       |
+------------+------------+
Enter fullscreen mode Exit fullscreen mode

You'll notice that all the teachers are listed. INNER JOIN does not return records with null values, but LEFT JOIN does.

WHY?

Keep in mind that when using the equality operator (=) with NULL to NULL, the result is neither true nor false—it's unknown. Therefore, records with NULL values on both sides of the comparison are not included in the result set when using INNER JOIN. However, by the nature of LEFT JOIN, it selects ALL records from the left side table (which is the teachers table in this case), even if the dept_id is null. Unlike INNER JOIN, which would remove such records from the list, LEFT JOIN retains them.

EXISTS & NULL and IN & NULL

Both EXISTS and IN do not return records with null values.
Let's consider the example where I have these tables.

Table name: customers

customer_id first_name last_name phone_number
1 John Doe 123-456-7890
2 Jane Smith 987-654-3210
3 Alice Johnson NULL
4 Bob Brown 555-555-5555

Table name: customers_2

customer_id first_name last_name phone_number
1 John Doe 123-456-7890
2 Jane Smith NULL
3 Alice Johnson 111-222-3333
4 Bob Brown NULL

Here is a query:
Using EXISTS to check if a name in b_table exists in a_table.

SELECT * FROM customers AS c1
WHERE EXISTS(
SELECT * FROM customers_2 AS c2
WHERE c1.first_name = c2.first_name 
AND c1.last_name = c2.last_name 
AND c1.phone_number = c2.phone_number); 
Enter fullscreen mode Exit fullscreen mode

Using IN to see if name in b_table is in a_table

SELECT *
FROM customers AS c1
WHERE (c1.first_name, c1.last_name, c1.phone_number) IN (
    SELECT c2.first_name, c2.last_name, c2.phone_number
    FROM customers_2 AS c2
);

Enter fullscreen mode Exit fullscreen mode

Both returns the same result

+-------------+------------+-----------+--------------+
| customer_id | first_name | last_name | phone_number |
+-------------+------------+-----------+--------------+
| 1           | John       | Doe       | 123-456-7890 |
+-------------+------------+-----------+--------------+
Enter fullscreen mode Exit fullscreen mode

Notice that records with null values are not returned.

WHY?
Because with EXISTS, you can't link columns that are null. For IN, it ignores null values as if they don't exist. Thus, both treat nulls in the same way: neither returns nulls.

** HOWEVER, it's not the same story with NOT EXISTS and NOT IN when dealing with NULL!**

Let's take a look.
I simply swapped EXISTS for NOT EXISTS.

SELECT * FROM customers AS c1
WHERE NOT EXISTS(
SELECT * FROM customers_2 AS c2
WHERE c1.first_name = c2.first_name 
AND c1.last_name = c2.last_name 
AND c1.phone_number = c2.phone_number);

Enter fullscreen mode Exit fullscreen mode

Here is the result

| customer_id | first_name | last_name | phone_number  |
|-------------|------------|-----------|---------------|
| 2           | Jane       | Smith     | 987-654-3210  |
| 3           | Alice      | Johnson   | [NULL]        |
| 4           | Bob        | Brown     | 555-555-5555  |
Enter fullscreen mode Exit fullscreen mode

Notice that it is returning records with null values. Both Jane and Bob's phone numbers in the customers_2 table are null, but since both phone numbers have values in the customers table, the returned values are not null.

How about using NOT IN?

SELECT *
FROM customers AS c1
WHERE (c1.first_name, c1.last_name, c1.phone_number) NOT IN (
    SELECT c2.first_name, c2.last_name, c2.phone_number
    FROM customers_2 AS c2
);

Enter fullscreen mode Exit fullscreen mode

Here is the result

+-------------+------------+-----------+--------------+
| customer_id | first_name | last_name | phone_number |
+-------------+------------+-----------+--------------+
Enter fullscreen mode Exit fullscreen mode

Nothing is returned!
Why does NOT EXISTS return null values but NOT IN does not? It's because they both treat null differently. NOT EXISTS is simply the opposite of EXISTS. It returns EVERYTHING that is not returned by EXISTS, including nulls.

However, NOT IN returns records that evaluate to TRUE, but in MySQL, null is neither TRUE nor FALSE—it is unknown.

Let's take a look at the record for Jane Smith with Id2.
When NOT IN is used, it compares:
Jane != Jane OR Smith != Smith OR 987-654-3210 != null.

Jane != Jane -> False
Smith != Smith -> False
987-654-3210 != null -> NULL
False or False or Null evaluates to NULL, so the record doesn't get returned!

Here's a little twist:
What if the first name in the customers table does not match the first name in the customers_2 table?

Let’s compare:
Jane Smith 987-654-3210 (from the customers table) and Jack Smith null (from the customers_2 table).

Here is the result

+-------------+------------+-----------+--------------+
| customer_id | first_name | last_name | phone_number |
+-------------+------------+-----------+--------------+
| 2           | Jack       | Smith     | 987-654-3210 |
Enter fullscreen mode Exit fullscreen mode

What? You see Jack! Doesn't NOT IN fail to evaluate null as either false or true, so the record shouldn't be returned?

Let's analyze it:

Jane != Jack -> True
Smith != Smith -> False
987-654-3210 != null -> NULL

True or False or Null actually returns:
True || False -> True
True || NULL -> True
Therefore, the record gets returned!

If you want to understand how true, false, and null are evaluated, here is the link:

https://dev.mysql.com/doc/refman/8.0/en/logical-operators.html#operator_and

As you can see, it can get quite confusing when it comes to comparing null values.

At the end of the day, I think it's best to explicitly compare null values using IS NULL or IS NOT NULL, which returns true or false as shown below.

SELECT * FROM customers AS c1
WHERE EXISTS(
SELECT * FROM customers_2 AS c2
WHERE c1.first_name = c2.first_name 
AND c1.last_name = c2.last_name 
AND c1.phone_number = c2.phone_number OR  (c1.phone_number IS NULL AND c2.phone_number IS NULL));

Enter fullscreen mode Exit fullscreen mode

Conclusion

Handling null values in MySQL can be quite challenging, requiring careful attention when they are present in your database. It's crucial to conduct explicit null checking to clearly define how null values should be treated. By implementing the strategies I mentioned earlier, such as using IS NULL or IS NOT NULL, you can ensure more predictable and accurate query results.

💖 💪 🙅 🚩
sachigoto
Sachi Goto

Posted on May 12, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related