Data type mismatch in referential integrity and its implications

bolajiwahab

Bolaji Wahab

Posted on April 10, 2023

Data type mismatch in referential integrity and its implications

I have seen situations where different compatible types were used in referential integrity AKA foreign key constraints. When we say data types are compatible, it means there is a conversion path between the types.

The catalog pg_cast stores data type conversion paths, both built-in and user-defined but it does not represent every type conversion that the system knows how to perform. Yet, we can infer some compatible types from pg_cast.

Let us have a look at compatible types for integer and varchar



postgres=# SELECT castsource::regtype, casttarget::regtype FROM pg_cast WHERE castsource IN ('integer'::regtype, 'varchar'::regtype) ORDER BY castsource;
    castsource     |    casttarget
-------------------+-------------------
 integer           | bigint
 integer           | smallint
 integer           | real
 integer           | double precision
 integer           | numeric
 integer           | money
 integer           | boolean
 integer           | oid
 integer           | regproc
 integer           | regprocedure
 integer           | regoper
 integer           | regoperator
 integer           | regclass
 integer           | regcollation
 integer           | regtype
 integer           | regconfig
 integer           | regdictionary
 integer           | regrole
 integer           | regnamespace
 integer           | "char"
 integer           | bit
 character varying | character
 character varying | "char"
 character varying | name
 character varying | regclass
 character varying | xml
 character varying | character varying
 character varying | text
(28 rows)


Enter fullscreen mode Exit fullscreen mode

Let us assume we have a products table and another table storing orders of those products. We want to ensure that the orders table only contains orders of products that actually exist. So, we define a foreign key constraint in the orders table that references the products table:



postgres=# CREATE TABLE products (
    product_number bigint PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    product_name text,
    price numeric
);
CREATE TABLE


Enter fullscreen mode Exit fullscreen mode


postgres=# CREATE TABLE orders (
    order_id integer PRIMARY KEY,
    product_number integer REFERENCES products (product_number), 
    quantity integer 
);
CREATE TABLE


Enter fullscreen mode Exit fullscreen mode

Everything seems to work. Let us populate the tables



postgres=# INSERT INTO products (product_name, price) VALUES ('book', 10), ('pen', 5), ('bag', 15);
INSERT 0 3


Enter fullscreen mode Exit fullscreen mode


postgres=# INSERT INTO orders (order_id, product_number, quantity) VALUES (1, 1, 1), (2, 2, 1), (3, 3, 1);
INSERT 0 3


Enter fullscreen mode Exit fullscreen mode

Also, everything works. But what happens when product_number is greater than 2147483648 (max value of integer).



postgres=# SELECT setval('products_product_number_seq', 2147483648);
   setval
------------
 2147483648
(1 row)

postgres=# INSERT INTO products (product_name, price) VALUES ('shoe', 20) RETURNING product_number;
 product_number
----------------
     2147483649
(1 row)

postgres=# INSERT into orders (order_id, product_number, quantity) VALUES (4, 2147483649, 1);
ERROR:  integer out of range


Enter fullscreen mode Exit fullscreen mode

We got integer out of range error. One way to fix this is to change orders.product_number type to bigint but we risk full table rewrite and application downtime for the duration of the change. Let us find out:



postgres=# SELECT relname, relfilenode FROM pg_class WHERE relname = 'orders';
 relname | relfilenode
---------+-------------
 orders  |       42404
(1 row)


Enter fullscreen mode Exit fullscreen mode


postgres=# ALTER TABLE orders ALTER COLUMN product_number TYPE bigint ;
ALTER TABLE
postgres=# SELECT relname, relfilenode FROM pg_class WHERE relname = 'orders';
 relname | relfilenode
---------+-------------
 orders  |       42419


Enter fullscreen mode Exit fullscreen mode

We can see relfilenode has changed, this tells us the table was rewritten. Imagine having to do this in a heavily-loaded large database.
There are other ways of adjusting the type such as using triggers and mirrored columns but that can be tasking.

Most common source of type mismatch in referential integrity happens when using types that have size/length constraint that such as smallint, integer serial, character varying(N).

So, is there a way we can catch this kind of mismatch earlier, maybe during development/review?
We can use the query below which produces nice info/message about type mismatch in referential integrity



WITH pk AS (
    SELECT CASE nc.nspname 
               WHEN 'public' THEN nc.nspname || '.' || c.conrelid::regclass::text
               ELSE c.conrelid::regclass::text
           END AS conrelid,
           c.conindid::regclass,
           string_agg(a.attname, ', ' order by a.attnum) as attnames,
           string_agg(format_type(a.atttypid, a.atttypmod), ', ' order by a.attnum) AS atttypid
      FROM pg_namespace AS nc
      JOIN pg_class AS r ON nc.oid = r.relnamespace
      JOIN pg_attribute AS a ON r.oid = a.attrelid
      JOIN pg_constraint AS c ON r.oid = c.conrelid
     WHERE a.attnum = ANY (c.conkey)
       AND NOT a.attisdropped 
       AND c.contype = 'p' 
       AND r.relkind IN ('r', 'p')
       AND nc.nspname NOT IN ('information_schema', 'pg_catalog')
  GROUP BY nc.nspname, c.conrelid::regclass::text, c.conindid::regclass
), fk AS (
    SELECT CASE nc.nspname 
               WHEN 'public' THEN nc.nspname || '.' || c.conrelid::regclass::text
               ELSE c.conrelid::regclass::text
           END AS conrelid,
           c.conindid::regclass,
           string_agg(a.attname, ', ' order by a.attnum) as attnames,
           string_agg(format_type(a.atttypid, a.atttypmod), ', ' order by a.attnum) AS atttypid
      FROM pg_namespace AS nc
      JOIN pg_class AS r ON nc.oid = r.relnamespace
      JOIN pg_attribute AS a ON r.oid = a.attrelid
      JOIN pg_constraint AS c ON r.oid = c.conrelid
      JOIN pk ON pk.conindid = c.conindid
     WHERE a.attnum = ANY (c.conkey)
       AND NOT a.attisdropped 
       AND c.contype = 'f' 
       AND r.relkind IN ('r', 'p') 
       AND nc.nspname NOT IN ('information_schema', 'pg_catalog')
  GROUP BY nc.nspname, c.conrelid::regclass::text, c.conindid::regclass
)
SELECT 'Parent ' || pk.conrelid || '.{' || pk.attnames || '} datatype {' || pk.atttypid || '} is different from child ' || fk.conrelid || '.{' || fk.attnames || '} datatype {' || fk.atttypid  || '}' AS info
  FROM pk 
  JOIN fk ON pk.conindid = fk.conindid
 WHERE pk.atttypid != fk.atttypid;
                                                                                       info
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Parent public.products.{product_number} datatype {bigint} is different from child public.orders.{product_number} datatype {integer}


Enter fullscreen mode Exit fullscreen mode

The query above can also be incorporated into some continuous integration.

Summary

It is easy to miss out the right data types when creating referential integrity and this can have impactful implications during the lifecycle of an application. Few of the implications are:

  • Application errors due to size/length constraint such as in integer and character varying(N).
  • Application downtime when setting the right data type.
  • If there is a rewrite, the rewrite will temporarily require as much as double the disk space of the table.

Key takeaways

Review the data types in referential integrity as part of your database design review.

💖 💪 🙅 🚩
bolajiwahab
Bolaji Wahab

Posted on April 10, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related