Postgres Arrays

mrpercival

Lawrence Cooke

Posted on June 21, 2024

Postgres Arrays

What are Postgres arrays?

Arrays are columns that can hold multiple values. They are useful when there is additional data that is tightly coupled to a row of data in a table.

Storing tags associated with a row, values from a web form where multiple options can be selected. These are both examples of where you could use an array.

Arrays do not replace lookup tables. Lookup tables can generally be accessed from multiple rows in a table and are not tightly coupled to a specific row.

Example without using arrays

Here is a simplified schema for a migraine tracker that stores both the start and end time, and a list of triggers.

Main table

CREATE TABLE
  public.migraines (
    id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    user_id integer NOT NULL,
    start_dt timestamp without time zone NULL,
    end_dt timestamp without time zone NULL,
  );
Enter fullscreen mode Exit fullscreen mode

Lookup table for trigger type names

CREATE TABLE
  public.trigger_types (
    id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    name character varying(30) NOT NULL
  );
Enter fullscreen mode Exit fullscreen mode

Table to store selected triggers

CREATE TABLE
  public.migraine_triggers (
    id integer NOT NULL GENERATED BY DEFAULT AS IDENTITY,
    user_id integer NOT NULL,
    migraine_id integer NOT NULL,
    trigger_id integer NOT NULL
  );
Enter fullscreen mode Exit fullscreen mode

Inserting Data

Inserting data requires two separate actions

  • Insert data into the migraine table
  • Insert Triggers into the migraine_triggers table

The insert into the migraine_triggers is likely a multi row insert.

INSERT INTO migraines (user_id,start_dt,end_dt) 
VALUES (1,'2024-06-18 09:30:00', '2024-06-18 10:30:00')
Enter fullscreen mode Exit fullscreen mode
INSERT INTO migraine_triggers (user_id,migraine_id,trigger_id) 
VALUES (1,2, 3),(1,2, 4),(1,2, 5)
Enter fullscreen mode Exit fullscreen mode

Updating Data

Updating data is not entirely straight forward, you have to decide what approach you want to take (or the approach that works best with your data).

1) Run a SELECT before UPDATE to find which ones already exist and INSERT items not already in the list. You may need to also delete rows that are no longer in the list.

2) Use a conflict resolution insert (if the table is indexed to allow it)

INSERT INTO migraine_triggers (user_id, migraine_id, trigger_id)
VALUES 
    (1, 2, 3),
    (1, 2, 4),
    (1, 2, 5)
ON CONFLICT (migraine_id, trigger_id) 
DO NOTHING;
Enter fullscreen mode Exit fullscreen mode

You may need to also delete rows that are no longer in the list with this method also.

3) Run a delete query to delete all rows related to the migraine_id and INSERT all the new items.

In all these scenarios multiple queries are required to update the data.

Selecting Data

A simple selection might be to find migraines where the migraine was triggered by trigger 3

SELECT migraines.*, migraine_triggers.trigger_id 
FROM migraines
INNER JOIN migraine_triggers 
ON migraine_triggers.migraine_id = migraines.id
WHERE migraine_triggers.trigger_id = 3
Enter fullscreen mode Exit fullscreen mode

Now a slightly more complex query to bring back the name of the trigger from the trigger_types table

SELECT migraines.*,trigger_types.name 
FROM migraines
INNER JOIN migraine_triggers 
ON migraine_triggers.migraine_id = migraines.id 
INNER JOIN trigger_types 
ON trigger_types.id = migraine_triggers.trigger_id
WHERE migraine_triggers.trigger_id = 3
Enter fullscreen mode Exit fullscreen mode

Example using arrays

Using arrays we can simplify the database design and the queries needed to retrieve the same information in the above examples.

One of the features of arrays that separates it from JSON OR JSONB fields is that the data is strictly typed.

The data that goes into an array must be the right type of data.

This ensures that data integrity is maintained in the array.

In this example the data type would be INTEGER. A CHAR could be used but using an integer and utilizing a lookup table has some advantages over just storing the names in the array.

Adding an array field

Instead of using the migraine_triggers table, we can add a column to the migraine table to hold the trigger_ids selected for the migraine.

This will prevent the need for multiple row inserts, deletes and updates. It can also improve select performance because the queries can be simplified in some cases. It also reduces the size of the database by not needing an additional, potentially large table.

To add an array column, add [] after the columns data type.

CREATE TABLE
  public.migraines (
    id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    user_id integer NOT NULL,
    start_dt timestamp without time zone NULL,
    end_dt timestamp without time zone NULL,
    trigger_types integer[] NULL
  );
Enter fullscreen mode Exit fullscreen mode

Inserting Data

Inserting data will now require just one query, wrap the values for the array in {} to insert the array.

INSERT INTO public.migraines (user_id, start_dt, end_dt, trigger_types)
VALUES (2, '2024-06-18 09:30:00', '2024-06-18 10:00:00', '{1,2}');
Enter fullscreen mode Exit fullscreen mode

Updating Data

Updating data is similar, just one query to update the migraine and the trigger data.

UPDATE public.migraines
SET end_dt='2024-06-18 11:00:00', trigger_types = '{1,3}'
WHERE id = 1;
Enter fullscreen mode Exit fullscreen mode

Selecting Data

A simple selection to find migraines where the migraine was triggered by trigger 3 can now be simplified from what it was before.

SELECT * FROM migraines WHERE 3 = ANY(trigger_types);
Enter fullscreen mode Exit fullscreen mode

In this case, there is no overhead from table joins.

Here is a more complex query where we want to pull in the trigger name from the trigger_types table.

SELECT migraines.*, trigger_types.name
FROM migraines 
INNER JOIN UNNEST(trigger_types) trigger_id ON trigger_id = 3
INNER JOIN trigger_types ON trigger_types.id = trigger_id;
Enter fullscreen mode Exit fullscreen mode

In this case we can use unnest to turn the array into rows and then join those rows with the trigger_types table.

Indexing Arrays

To improve performance, you can add an index to an array field.

Using a GIN (Generalized Inverted Index) is most likely the best index type to choose.

GIN is designed for fields where multiple values are present. Arrays, JSONB are both examples where you might want to use a GIN index.

CREATE INDEX idx_gin_triggers ON migraines USING GIN (trigger_types);
Enter fullscreen mode Exit fullscreen mode

Arrays are not right for every situations, but can provide a efficient way to store row meta data.

They can simplify database design and queries, while maintaining data integrity and ease of access.

Further information on arrays can be found in the Postgres Manual

💖 💪 🙅 🚩
mrpercival
Lawrence Cooke

Posted on June 21, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Postgres Arrays
postgres Postgres Arrays

June 21, 2024