Be careful when using NULL in PostgreSQL

quachthanhhmd

Thanh Quach

Posted on July 13, 2024

Be careful when using NULL in PostgreSQL

When creating the data model, we often use the NULL value to signify a missing value in certain fields for specific rows. However, be cautious with NULL values, as they can lead to data mismatches when performing the SQL queries. The following article will highlight common issues encountered by new developers when using SQL queries and provide strategies to resolve these problems.

What is the null value in SQL?

A NULL value is used to represent a data value that does not exist. In other words, It's a placeholder for the missing or the value that users do not know at the current time.
To illustrate, please consider the question: how many Apple stores are in the world? If the answer is 0, it means that we have no Apple stores in the world. But if the answer is "NULL", it stands for we don't know about it, and maybe we need to check on the internet and respond to it later.
Damn.... we're getting into quite a bit of theory. Please get back to the main topic to find the magic of NULL value.

Why Null value is special in SQL?

Imagine that Graham was a newbie developer of a ABC startup company. In this company, he has the role of managing the HR system. Throughout this article, we will concentrate on the employees table likes:

CREATE TABLE employees (
    employee_id serial primary key,
    employee_name varchar(50),
    age numeric,
    base_salary numeric,
    additional_salary numeric,
    CONSTRAINT ck_age CHECK ( age > 0 ),
    CONSTRAINT ck_base_salary CHECK ( base_salary >= 0 ),
    CONSTRAINT ck_additional_salary CHECK ( additional_salary >= 0 )
);

INSERT INTO employees (employee_name, age, base_salary, additional_salary)
VALUES
('Graham', 24, 2000, 1000),
('Lynx', 25, 1300, null),
('Tian', 23, 1300, null),
('Kent', 32, 1040, null),
(null, 28, 3000, null),
(null, 50, 1000, null);
Enter fullscreen mode Exit fullscreen mode

NULL value cannot be used with arithmetic operators.

On a beautiful day, his manager - Max, told him: "Please help me calculate the salary of our employees, you can calculate by sum base_salary with additional_salary".
"Yes sir, it's so easy, give me 5 seconds" - Graham said.

Since he was a junior backend developer who wasn't familiar with SQL queries, he wrote naive SQL like:

SELECT 
    employee_id, 
    employee_name, 
    age, 
    (base_salary + additional_salary) as monthly_salary
FROM employees;
Enter fullscreen mode Exit fullscreen mode

Salary calculation
"This is my result, it's so easy,": Graham said.
However, upon reviewing Graham's report, Max, who came from a technical background, angrily remarked, "You forgot to convert NULL values to zero, didn't you?".
Surely you already know what happened. This problem caused by NULL fundamentally changes how it interacts with arithmetic operations. In other words, we cannot use NULL with any arithmetic operation such as: +, -, *, /,...
To resolve this issue, we might use two approaches:

  • Convert NULL to zero: in PostgreSQL, the COALESCE function is used to convert NULL to the specific default value. For example:
SELECT 
    employee_id, 
    employee_name, 
    age, 
    COALESCE(base_salary, 0) + COALESCE(additional_salary, 0) as monthly_salary
FROM employees;
Enter fullscreen mode Exit fullscreen mode

coalese salary

  • Setting a default value: when designing the employee table, we can set default values for base_salary and additional_salary to 0. Please keep in mind that we can not use the default value for all circumstances. To illustrate, if you want to calculate BMI, we have to calculate by the formula: weight (kg) / [height (m)]2. If you convert, for example, the weight to zero, the result will be:
SELECT COALESCE(NULL, 0) / POWER(1.8, 2)
Enter fullscreen mode Exit fullscreen mode

The result will be zero (this result should be NULL to show the unknown BMI caused by missing information). It definitely causes some confusion for users looking at it.

📝Summary: be careful when using NULL with arithmetic operators since any arithmetic operation involving NULL will result in NULL. Consider using the default value with the nullable column when performing the calculation.

NULL value cannot be used in the comparison operation.

Ignoring Graham's previous mistake since he's just a new member, Max assigned him a new task: "Please find the employees named Graham, Lynx, Tian, and employees who did not input their names in our system".
Graham thinks: "Hmm, it's also so easy, I will make up for my previous mistake". He wrote the SQL query:

SELECT employee_id, employee_name, age
FROM public.employees
WHERE employee_name IN ('Graham', 'Lynx', 'Tian', NULL)
Enter fullscreen mode Exit fullscreen mode

The result of this SQL was:

in
"Yeah, this is the information of Graham, Lynx, and Tian. Moreover, wonderful, all of our employees have updated their names, my boss".
After looking at this result, Tom has some suspicions about the veracity of the reports. Originally a technical person, he has double-checked in the database.

SELECT  employee_id, employee_name, age FROM employees
Enter fullscreen mode Exit fullscreen mode

Surprisingly, the result was:

all-data
Do you know what is happening in this example? the answer is "the NULL value cannot be equal or unequal to any value, so that cannot perform any comparison on NULL values" It means that we cannot use IN, NOT IN, =, <>, >, <, >=, <=,... to compare with NULL.
To correct this report, we have to write a SQL query like:

    SELECT employee_id, employee_name, age
    FROM public.employees
    WHERE employee_name in ('Graham', 'Lynx', 'Tian') OR employee_name IS null
Enter fullscreen mode Exit fullscreen mode

Moreover, please be careful when using the NULL value in conjunction with NOT IN. For instance, Graham wishes to retrieve all employees except those who haven't provided their name or his name. If he employs NOT IN in this scenario:

SELECT employee_id, employee_name, age
FROM employees
WHERE employee_name NOT IN ('Graham', null)
Enter fullscreen mode Exit fullscreen mode

Bummm, it seems like you have imagined the previous SQL results. We don't have any records in this query.

NO Datea

Why is it that when we use the IN operator, we retrieve multiple records, but when we use NOT IN, we end up with no results at all? Let's move to the next section to deep comprehend this issue.

📝Summary*: NULL only operates with IS and IS NOT NULL.

NULL value with conjunctive operator.

To clarify, In SQL, conjunctive operations are used to combine conditions in a query. The main conjunctive operators are:
1. **AND: This operator combines two or more conditions, and all conditions must be true for the overall condition to be true.
2. **OR: This operator combines two or more conditions, and at least one condition must be true for the overall condition to be true.
Ok, enough theory :), let's get started on this section.
As the previous section, NULL can not perform with any comparison operators, have you considered what occurs when employing conjunctive operators with NULL values?

Using NULL with AND

Let's back to Graham's workplace, he wants to find no-name employees and those more than 30 years old. If he performs:

SELECT employee_id, employee_name, age
FROM EMPLOYEES
WHERE employee_name = NULL AND age > 30
Enter fullscreen mode Exit fullscreen mode

Hmmm, the result is that Graham cannot get any data:

No-data
It's so complicated, Graham tries to investigate what happens when the where condition USER_NAME = NULL AND AGE > 30. He can use:

SELECT employee_id, employee_name, age, employee_name = NULL AND age > 30
FROM EMPLOYEES;
Enter fullscreen mode Exit fullscreen mode

After executing this query, he gets the answer:

Checking IN condition
Weird, why do we have 2 records with null values? and the others are false? Graham noticed that when he used NULL AND FALSE the result was FALSE. In contrast, NULL AND TRUE, the result was NULL. Since where statement only matches and gets records when the query is true, he cannot retrieve any records.
He tries to double-check again by the SQL:

SELECT NULL AND TRUE AS NULL_TRUE, NULL AND FALSE AS NULL_FALSE;
Enter fullscreen mode Exit fullscreen mode

Wonderful, his statement was true.

Check 3VL

Using NULL with OR

Let's slightly modify Graham's previous request: "finding no-name employees OR those more than 30 years old.". We continue sightly change the Graham's SQL:

SELECT employee_id, employee_name, age
FROM EMPLOYEES
WHERE employee_name = NULL OR age > 30
Enter fullscreen mode Exit fullscreen mode

The results may surprise you a bit. We have 2 records in this case.

OR condition
We also try to investigate a previous topic:

SELECT employee_id, employee_name, age, USER_NAME = NULL OR AGE > 30
FROM EMPLOYEES;
Enter fullscreen mode Exit fullscreen mode

Investigate
In this scenario, NULL OR TRUEgave the TRUE and NULL OR FALSE retrieve the NULL value.

Clarify some points

We moved on to 2 sections about NULL with the conjunctive operator. Please keep in mind that to clarify this section, Graham uses USER_NAME = NULL (actually, he must use USER_NAME IS NULL). In PostgreSQL, dealing with NULL values involves adhering to Three-valued logic (3VL). The following picture will clarify more detail about 3VL:

3VL
If you wish to delve deeper into this theory, please refer to the following resource for more information: Three-valued logic - Wikipedia

Conclusion

Managing NULL values can pose significant challenges when using certain operators in SQL statements. Carelessness in handling NULL values can lead to mismatches and affect user experience. Some individuals opt to avoid NULL values by substituting them with empty strings or default values. However, this approach can create confusion when attempting to represent unknown or undefined data in a table.

References

After Credit

In section 2, we had issues when using IN and NOT IN we didn't have any records, but when using NULL with IN, we found some records except NULL.

Using IN with NULL

In PostgreSQL, performing the SQL with IN like:

SELECT column_name(s) 
FROM table_name 
WHERE column_name IN (value1, value2, NULL, ...);
Enter fullscreen mode Exit fullscreen mode

When executing, PostgreSQL will convert this SQL to column_name = value1 OR column_name = value2 OR column_name = NULL OR ... like

SELECT column_name(s) 
FROM table_name 
WHERE column_name = value1 OR column_name = value2 OR column_name = NULL OR ...
Enter fullscreen mode Exit fullscreen mode

You might already be aware of this, right? due to Three-valued logic, TRUE OR NULL evaluates to TRUE. This explains why all records that meet the WHERE condition are returned, except those where the value is NULL.

Using NOT IN with NULL

Similar to the previous section, we execute the SQL statement:

SELECT column_name(s) 
FROM table_name 
WHERE column_name NOT IN (value1, value2, NULL, ...);
Enter fullscreen mode Exit fullscreen mode

PostgreSQL will convert it to the SQL like:

SELECT column_name(s) 
FROM table_name 
WHERE column_name != value1 AND column_name != value2 AND column_name != NULL AND ..
Enter fullscreen mode Exit fullscreen mode

When using NOT IN in PostgreSQL, each row in table_name is checked to determine if the value in column_name does not exist within the specified list of values from a subquery or an explicit list. According to Three-valued logic (3VL), NULL != NULL evaluates to NULL. Additionally, operations such as NULL AND TRUE result in NULL, and NULL AND FALSE result in FALSE. Therefore, based on these logical rules, the statement within the query cannot be evaluated as true. Consequently, regardless of the values included in the NOT IN list, the query will yield zero results.
To deal with this problem we can use one of the following two ways:

  • Using AND condition
     SELECT column_name(s) 
     FROM table_name 
     WHERE column_name NOT IN (value1, value2,...) AND column is not NULL
Enter fullscreen mode Exit fullscreen mode
  • Using NOT EXISTS If we use NOT EXISTS, we have to convert the value list to a temporary table or SQL select.
SELECT *
FROM EMPLOYEES e
WHERE NOT EXISTS (
    SELECT 
    FROM
    (VALUES ('Graham'), (NULL)) AS t (user_name)
    WHERE t.employee_name = e.employee_name OR e.employee_name IS NULL
);
Enter fullscreen mode Exit fullscreen mode

In practice, it's recommended to use EXISTS and NOT EXISTS for better performance, but we'll cover this topic in upcoming articles.


Thanks for reading this section! I hope that this article can be helpful when performing the SQL query. If you have any questions, or discussions or notice any mistakes, please put comments at the end of this section. See you in the next article :)

Author: Thanh Quach

💖 💪 🙅 🚩
quachthanhhmd
Thanh Quach

Posted on July 13, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related