Pitfalls of the MINUS-comparison
Samuel Nitsche
Posted on June 6, 2019
My work includes a lot of database views because they work as a kind of public API in our case. Since I started talking about self-testing and utPLSQL, the MINUS-comparison has always had its place when assuring that a new, reconstructed view has the exact same content as the old one.
However, I did it wrong until recently when I discovered a major pitfall.
For I like to tell stories let’s start with a Star Wars scenario again: We have a table which contains all the Star Wars characters and another table which holds information in which of the “Episode”-movies they appeared.
We have the following sample data:
ID | Character | Episodes |
---|---|---|
1 | Darth Vader | 3, 4, 5, 6 |
2 | Luke Skywalker | 4, 5, 6, 7, 8 |
3 | Rey | 7, 8 |
Our current view is this:
create or replace view all_movie_characters
as
select
sw_char.id,
sw_char.name
from
star_wars_characters sw_char
inner join appearance_in_episode ep
on sw_char.id = ep.character_fk
group by sw_char.id, sw_char.name;
However we just found a neat little function called appears_in_movie(i_character_id)
– it looks like we could make our view more readable by using it (don’t start an argument about performance, this is just a made-up example!):
create or replace view all_movie_character_2
as
select
sw_char.id,
sw_char.name
from
star_wars_characters sw_char
where appears_in_movie(sw_char.id) = 1;
Because we are very careful we didn’t just replace our initial view but gave it a new (very creative) name so we can compare both views and make sure they have the same content:
select * from all_movie_characters
minus
select * from all_movie_character_2
union all
select * from all_movie_character_2
minus
select * from all_movie_characters;
0 Rows returned. Great! We can ship this neat little improvement to production with confidence!
Never trust yourself. Let’s look at the appears_in_movie
-function:
create or replace function appears_in_movie(
i_character_id integer
) return integer result_cache
as
l_count integer;
begin
select count(*)
into l_count
from appearance_in_episode
where character_fk = i_character_id
and episode_no between 4 and 6;
if l_count > 0 then
return 1;
else
return 2;
end if;
end;
Ooops. Seems like the person who created this function had a different context and did only care about the characters who appear in the original 3 episodes 4 to 6! (I want to claim hereby that I completely endorse the episodes 7+ so far)
Results of selecting the new view:
| 1 | Darth Vader |
| 2 | Luke Skywalker |
But what happened? Why didn’t we notice that Rey is completely missing in the new view?
The problem is that UNION ALL
and MINUS
have the same precedence, which means that they are applied in the order of occurence. Here’s what happened:
Darth Vader
Luke Skywalker
Rey
MINUS
Darth Vader
Luke Skywalker
----------------------
Rey
UNION ALL
Darth Vader
Luke Skywalker
---------------------
Rey
Darth Vader
Luke Skywalker
MINUS
Darth Vader
Luke Skywalker
Rey
--------------------
0 Rows.
To avoid this, we need to wrap the two MINUS
comparisons in parenthesis:
(
select * from all_movie_characters
minus
select * from all_movie_character_2
)
union all
(
select * from all_movie_character_2
minus
select * from all_movie_characters
);
1 Row returned: 3 Rey
You can find the whole example on LiveSQL and the commented version on my github repository.
Is this a #100CodeExample?
Yes, the version on github is a completely commented, runnable example, as the previous ones.
But I want to experiment a bit with making the examples on the blog more enjoyable and approachable by adding some story and only highlighting the crucial parts.
Posted on June 6, 2019
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.