Pagination condition for multiple columns

xuanyu

Xuanyu Wang

Posted on May 21, 2024

Pagination condition for multiple columns

There are many ways to do pagination in production.

In this article, I'll explore an efficient method for pagination in production environments: fetching records up to the page size using conditions based on the last record's values.

This approach ensures that records on subsequent pages are consistently ordered, even if the database is updated between page views.

Why not OFFSET?

It’s not efficient

According to the documentation of Postgres

OFFSET says to skip that many rows before beginning to return rows.

and

The rows skipped by an OFFSET clause still have to be computed inside the server; therefore a large OFFSET might be inefficient.

Therefore, when querying a large volume of data, using OFFSET may degrade the performance especially when querying for the last few pages.

Customers could be confused

Unless the data is fixed during the session of browsing pages, using OFFSET can return correct but confusing data because the records could be updated between pages.

For example, after displaying the first page of 10 records and before displaying the second page, new records could be inserted into the database. Accidentally, the records are sorted in the first place. When the second page is displayed, a user will see the first record on the second page as the 10th record displayed on the first page because it’s squeezed out of the first page.

What’s the WHERE clause?

The core is defining the WHERE clause.

It’s easy to write the WHERE clauses when ordered by one column. It becomes tricky when ordered by multiple columns. Let’s start from order by one column, then two columns, and finally expand to any number of columns.

Case study: order by one column

Let’s say there is a table like the following

CREATE TABLE IF NOT EXISTS examples (
    B TIMESTAMP NULL,
    CONSTRAINT unique_b UNIQUE (B)
);
Enter fullscreen mode Exit fullscreen mode

Column B is nullable and contains only unique values.

Let’s say the column B's value of the last record of the last page is b. Then the records sorted after b could be categorized into the following cases:

  1. when b is NULL and column B is sorted with NULLS LAST. In this case, b is already the last value of column B and there is no value after it. We can use condition B IS NULL AND B IS NOT NULL to represent.
  2. when b is NULL and column B is sorted with NULLS FIRST. In this case, values after b should be non-null. The condition is B IS NOT NULL
  3. when b is not NULL and column B is sorted with NULLS LAST. The values after b could be NULL or non-null. The condition is B > b OR B IS NULL (when column B is sorted in ascending order) or B < b OR B IS NULL (when column B is sorted in descending order)
  4. when b is not NULL and column B is sorted with NULLS FIRST. The values after b can only be non-null. The condition is B > b or B < b depending on the sorting direction.

Case study: order by two columns

Let’s say there is a table like the following

CREATE TABLE IF NOT EXISTS examples (
    A INT NULL,
    B TIMESTAMP NULL,
    CONSTRAINT unique_ab UNIQUE (A, B)
);
Enter fullscreen mode Exit fullscreen mode

Both column A and B are nullable and their value combinations are unique.

Assuming in the last record of the last page, column A's value is a and column B's value is b. The query will sort A first and sort B second.

The records after the first page should fall in one of the following cases:

  1. their column A’s value is equal to a but their column B's value is sorted after b
  2. their column A's value is sorted after a, regardless of their column B's value.

Expand to multiple columns

After looking at the base case (sorting by one column) and how two columns are sorted, we can easily induce the following conclusion:

When a query is sorted by multiple columns column 1, column 2, ..., column N where N >= 1 , all records (whose values are a1, a2, ..., aN) after a record whose values are v1, v2, ..., vN should meet one of the following conditions:

  1. a1 is sorted after v1 , regardless what values a2, ... aN are
  2. a1 equals to v1, and a2, ..., aN should be sorted after v2, ..., vN.

Example data

Let’s say the page size is 5. The result is sorted by two columns A in ascending order and B in descending order, which is ORDER BY A ASC NULLS LAST, B DESC NULLS FIRST. There is no duplicate combination of A and B's value (which is UNIQUE (A, B))

There are different kinds of records that could appear on the 2nd page:

  • The values of the 5th row are not null. The 6th row could be one of the following cases

    row A B
    5 20 2020-02-01
    6 20 2020-01-31
    row A B
    5 20 2020-02-01
    6 21 NULL
    row A B
    5 20 2020-02-01
    6 NULL 2020-01-31
    row A B
    5 20 2020-02-01
    6 NULL NULL
  • when only the 5th row’s B value is NULL, then the 6th row could be

    row A B
    5 20 NULL
    6 20 2020-01-31
    row A B
    5 20 NULL
    6 21 NULL
  • when only the A value of the 5th row is NULL, then the 6th row could be

    row A B
    5 NULL 2020-02-01
    6 NULL 2020-01-31
  • when values of the 5th row are all NULL, then the 6th row could be

    row A B
    5 NULL NULL
    6 NULL 2020-01-31

Algorithm code in GO

The full code is at https://github.com/xuanyuwang/go-db-examples/blob/main/pagination/pagination.go

I’ll break the full code later.

package pagination

import (
    "fmt"
)

const (
    Asc   = "ASC"
    Desc  = "DESC"
    First = "FIRST"
    Last  = "LAST"
)

type OrderByColumn struct {
    SortExpresssion string
    Direction       string // ASC or DESC
    NullOption      string // FIRST or LAST
}

type Condition struct {
    SQL    string        // like "A = ? AND B > ?"
    Values []interface{} // like []interface{}{20, "2020-01-01"}
}

func (c *Condition) mergeValues(values []interface{}) {
    if len(values) > 0 {
        c.Values = append(c.Values, values...)
    }
}

func NextPage(
    columns []OrderByColumn, // the definition of ORDER BY columns
    values []interface{}, // the values of the last row of the last page
) Condition {
    column := columns[0]
    // The value of column.SortExpression in the last row of the last page
    prevValue := values[0]

    sign := "<"
    if column.Direction == Asc {
        sign = ">"
    }

    if len(columns) == 1 {
        condition := Condition{SQL: "", Values: make([]interface{}, 0, len(columns))}

        switch {
        case prevValue == nil && column.NullOption == Last:
            // No value after NULL
            condition.SQL = fmt.Sprintf("(%s IS NOT NULL AND %s IS NULL)", column.SortExpresssion, column.SortExpresssion)
        case prevValue == nil && column.NullOption == First:
            condition.SQL = fmt.Sprintf("(%s IS NOT NULL)", column.SortExpresssion)
        case prevValue != nil && column.NullOption == Last:
            // the next row could be NULL
            condition.SQL = fmt.Sprintf("((%s %s ?) OR (%s IS NULL))", column.SortExpresssion, sign, column.SortExpresssion)
            condition.Values = []interface{}{prevValue}
        case prevValue != nil && column.NullOption == First:
            condition.SQL = fmt.Sprintf("(%s %s ?)", column.SortExpresssion, sign)
            condition.Values = []interface{}{prevValue}
        }
        return condition
    } else {
        condition := NextPage(columns[1:], values[1:])
        var newCondition Condition
        switch {
        case prevValue == nil && column.NullOption == Last:
            newCondition.SQL = fmt.Sprintf("((%s IS NULL) AND %s)", column.SortExpresssion, condition.SQL)
            newCondition.mergeValues(condition.Values)
        case prevValue == nil && column.NullOption == First:
            newCondition.SQL = fmt.Sprintf("((%s IS NOT NULL) OR ((%s IS NULL) AND %s))", column.SortExpresssion, column.SortExpresssion, condition.SQL)
            newCondition.mergeValues(condition.Values)
        case prevValue != nil && column.NullOption == Last:
            // the next row could be NULL
            newCondition.SQL = fmt.Sprintf(
                "(((%s %s ?) OR (%s IS NULL)) OR ((%s = ?) AND %s))",
                column.SortExpresssion, sign, column.SortExpresssion, column.SortExpresssion, condition.SQL,
            )
            newCondition.mergeValues([]interface{}{prevValue, prevValue})
            newCondition.mergeValues(condition.Values)
        case prevValue != nil && column.NullOption == First:
            newCondition.SQL = fmt.Sprintf("(%s %s ?) OR ((%s = ?) AND %s)", column.SortExpresssion, sign, column.SortExpresssion, condition.SQL)
            newCondition.mergeValues([]interface{}{prevValue, prevValue})
            newCondition.mergeValues(condition.Values)
        }
        return newCondition
    }
}
Enter fullscreen mode Exit fullscreen mode

Let me break down the code into sections. The following code sections are not exactly the same as the full code for readability consideration.

Define ORDER BY column

At first, we need to understand what defines an ORDER BY column.

According to the doc of Postgresql, an ORDER BY column defined by

  • a sort expression. “The sort expression(s) can be any expression that would be valid in the query's select list”
  • a sort direction, which is either ASC or DESC
  • a null position, which is NULLS FIRST or NULLS LAST

It’s straightforward to create a struct for order by columns like the following

// Note the Direction field and NullOption would better be enum
type OrderByColumn struct {
    SortExpresssion string
    Direction       string // ASC or DESC
    NullOption      string // FIRST or LAST
}

const (
    Asc   = "ASC"
    Desc  = "DESC"
    First = "FIRST"
    Last  = "LAST"
)
Enter fullscreen mode Exit fullscreen mode

Generate the condition for the base case which is sorting on one column

Let’s first define the condition struct

type Condition struct {
    SQL    string        // like "A = ? AND B > ?"
    Values []interface{} // like []interface{}{20, "2020-01-01"}
}

func (c *Condition) mergeValues(values []interface{}) {
    if len(values) > 0 {
        c.Values = append(c.Values, values...)
    }
}
Enter fullscreen mode Exit fullscreen mode

Then the condition for sorting one one column is

    condition := Condition{SQL: "", Values: make([]interface{}, 0, len(columns))}

    // `column` is of type OrderByColumn.
    // `prevValue` is the value of `column` in the last record of the last page
    switch {
    case prevValue == nil && column.NullOption == Last:
        // No value after NULL
        condition.SQL = fmt.Sprintf("(%s IS NOT NULL)", column.SortExpresssion)
    case prevValue == nil && column.NullOption == First:
        condition.SQL = fmt.Sprintf("(%s IS NOT NULL)", column.SortExpresssion)
    case prevValue != nil && column.NullOption == Last:
        // the next row could be NULL
        condition.SQL = fmt.Sprintf("((%s %s ?) OR (%s IS NULL))", column.SortExpresssion, sign, column.SortExpresssion)
        condition.Values = []interface{}{prevValue}
    case prevValue != nil && column.NullOption == First:
        condition.SQL = fmt.Sprintf("(%s %s ?)", column.SortExpresssion, sign)
        condition.Values = []interface{}{prevValue}
    }

    return condition
Enter fullscreen mode Exit fullscreen mode

Once we have the condition for the columns except the first column, then we can define the condition for all columns by

        condition := NextPage(columns[1:], values[1:])
        var newCondition Condition
        switch {
        case prevValue == nil && column.NullOption == Last:
            newCondition.SQL = fmt.Sprintf("((%s IS NULL) AND %s)", column.SortExpresssion, condition.SQL)
            newCondition.mergeValues(condition.Values)
        case prevValue == nil && column.NullOption == First:
            newCondition.SQL = fmt.Sprintf("((%s IS NOT NULL) OR ((%s IS NULL) AND %s))", column.SortExpresssion, column.SortExpresssion, condition.SQL)
            newCondition.mergeValues(condition.Values)
        case prevValue != nil && column.NullOption == Last:
            // the next row could be NULL
            newCondition.SQL = fmt.Sprintf(
                "(((%s %s ?) OR (%s IS NULL)) OR ((%s = ?) AND %s))",
                column.SortExpresssion, sign, column.SortExpresssion, column.SortExpresssion, condition.SQL,
            )
            newCondition.mergeValues([]interface{}{prevValue, prevValue})
            newCondition.mergeValues(condition.Values)
        case prevValue != nil && column.NullOption == First:
            newCondition.SQL = fmt.Sprintf("(%s %s ?) OR ((%s = ?) AND %s)", column.SortExpresssion, sign, column.SortExpresssion, condition.SQL)
            newCondition.mergeValues([]interface{}{prevValue, prevValue})
            newCondition.mergeValues(condition.Values)
        }
        return newCondition
Enter fullscreen mode Exit fullscreen mode

Example usage

In the unit tests at https://github.com/xuanyuwang/go-db-examples/blob/main/pagination/pagination_test.go, you can find how the NextPage is used.

💖 💪 🙅 🚩
xuanyu
Xuanyu Wang

Posted on May 21, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related