Xuanyu Wang
Posted on May 21, 2024
There are many ways to do pagination in production.
In this article, I'll explore an efficient method for pagination in production environments: fetching records up to the page size using conditions based on the last record's values.
This approach ensures that records on subsequent pages are consistently ordered, even if the database is updated between page views.
Why not OFFSET
?
It’s not efficient
According to the documentation of Postgres
OFFSET
says to skip that many rows before beginning to return rows.
and
The rows skipped by an
OFFSET
clause still have to be computed inside the server; therefore a largeOFFSET
might be inefficient.
Therefore, when querying a large volume of data, using OFFSET
may degrade the performance especially when querying for the last few pages.
Customers could be confused
Unless the data is fixed during the session of browsing pages, using OFFSET
can return correct but confusing data because the records could be updated between pages.
For example, after displaying the first page of 10 records and before displaying the second page, new records could be inserted into the database. Accidentally, the records are sorted in the first place. When the second page is displayed, a user will see the first record on the second page as the 10th record displayed on the first page because it’s squeezed out of the first page.
What’s the WHERE
clause?
The core is defining the WHERE
clause.
It’s easy to write the WHERE
clauses when ordered by one column. It becomes tricky when ordered by multiple columns. Let’s start from order by one column, then two columns, and finally expand to any number of columns.
Case study: order by one column
Let’s say there is a table like the following
CREATE TABLE IF NOT EXISTS examples (
B TIMESTAMP NULL,
CONSTRAINT unique_b UNIQUE (B)
);
Column B
is nullable and contains only unique values.
Let’s say the column B
's value of the last record of the last page is b
. Then the records sorted after b
could be categorized into the following cases:
- when
b
isNULL
and columnB
is sorted withNULLS LAST
. In this case,b
is already the last value of columnB
and there is no value after it. We can use conditionB IS NULL AND B IS NOT NULL
to represent. - when
b
isNULL
and columnB
is sorted withNULLS FIRST
. In this case, values afterb
should be non-null. The condition isB IS NOT NULL
- when
b
is notNULL
and columnB
is sorted withNULLS LAST
. The values afterb
could beNULL
or non-null. The condition isB > b OR B IS NULL
(when columnB
is sorted in ascending order) orB < b OR B IS NULL
(when columnB
is sorted in descending order) - when
b
is notNULL
and columnB
is sorted withNULLS FIRST
. The values afterb
can only be non-null. The condition isB > b
orB < b
depending on the sorting direction.
Case study: order by two columns
Let’s say there is a table like the following
CREATE TABLE IF NOT EXISTS examples (
A INT NULL,
B TIMESTAMP NULL,
CONSTRAINT unique_ab UNIQUE (A, B)
);
Both column A
and B
are nullable and their value combinations are unique.
Assuming in the last record of the last page, column A
's value is a
and column B
's value is b
. The query will sort A first and sort B second.
The records after the first page should fall in one of the following cases:
- their column
A
’s value is equal toa
but their columnB
's value is sorted afterb
- their column
A
's value is sorted aftera
, regardless of their columnB
's value.
Expand to multiple columns
After looking at the base case (sorting by one column) and how two columns are sorted, we can easily induce the following conclusion:
When a query is sorted by multiple columns column 1, column 2, ..., column N
where N >= 1
, all records (whose values are a1, a2, ..., aN
) after a record whose values are v1, v2, ..., vN
should meet one of the following conditions:
-
a1
is sorted afterv1
, regardless what valuesa2, ... aN
are -
a1
equals tov1
, anda2, ..., aN
should be sorted afterv2, ..., vN
.
Example data
Let’s say the page size is 5. The result is sorted by two columns A
in ascending order and B
in descending order, which is ORDER BY A ASC NULLS LAST, B DESC NULLS FIRST
. There is no duplicate combination of A
and B
's value (which is UNIQUE (A, B)
)
There are different kinds of records that could appear on the 2nd page:
-
The values of the 5th row are not null. The 6th row could be one of the following cases
row A B 5 20 2020-02-01 6 20 2020-01-31 row A B 5 20 2020-02-01 6 21 NULL row A B 5 20 2020-02-01 6 NULL 2020-01-31 row A B 5 20 2020-02-01 6 NULL NULL -
when only the 5th row’s
B
value isNULL
, then the 6th row could berow A B 5 20 NULL 6 20 2020-01-31 row A B 5 20 NULL 6 21 NULL -
when only the
A
value of the 5th row isNULL
, then the 6th row could berow A B 5 NULL 2020-02-01 6 NULL 2020-01-31 -
when values of the 5th row are all
NULL
, then the 6th row could berow A B 5 NULL NULL 6 NULL 2020-01-31
Algorithm code in GO
The full code is at https://github.com/xuanyuwang/go-db-examples/blob/main/pagination/pagination.go
I’ll break the full code later.
package pagination
import (
"fmt"
)
const (
Asc = "ASC"
Desc = "DESC"
First = "FIRST"
Last = "LAST"
)
type OrderByColumn struct {
SortExpresssion string
Direction string // ASC or DESC
NullOption string // FIRST or LAST
}
type Condition struct {
SQL string // like "A = ? AND B > ?"
Values []interface{} // like []interface{}{20, "2020-01-01"}
}
func (c *Condition) mergeValues(values []interface{}) {
if len(values) > 0 {
c.Values = append(c.Values, values...)
}
}
func NextPage(
columns []OrderByColumn, // the definition of ORDER BY columns
values []interface{}, // the values of the last row of the last page
) Condition {
column := columns[0]
// The value of column.SortExpression in the last row of the last page
prevValue := values[0]
sign := "<"
if column.Direction == Asc {
sign = ">"
}
if len(columns) == 1 {
condition := Condition{SQL: "", Values: make([]interface{}, 0, len(columns))}
switch {
case prevValue == nil && column.NullOption == Last:
// No value after NULL
condition.SQL = fmt.Sprintf("(%s IS NOT NULL AND %s IS NULL)", column.SortExpresssion, column.SortExpresssion)
case prevValue == nil && column.NullOption == First:
condition.SQL = fmt.Sprintf("(%s IS NOT NULL)", column.SortExpresssion)
case prevValue != nil && column.NullOption == Last:
// the next row could be NULL
condition.SQL = fmt.Sprintf("((%s %s ?) OR (%s IS NULL))", column.SortExpresssion, sign, column.SortExpresssion)
condition.Values = []interface{}{prevValue}
case prevValue != nil && column.NullOption == First:
condition.SQL = fmt.Sprintf("(%s %s ?)", column.SortExpresssion, sign)
condition.Values = []interface{}{prevValue}
}
return condition
} else {
condition := NextPage(columns[1:], values[1:])
var newCondition Condition
switch {
case prevValue == nil && column.NullOption == Last:
newCondition.SQL = fmt.Sprintf("((%s IS NULL) AND %s)", column.SortExpresssion, condition.SQL)
newCondition.mergeValues(condition.Values)
case prevValue == nil && column.NullOption == First:
newCondition.SQL = fmt.Sprintf("((%s IS NOT NULL) OR ((%s IS NULL) AND %s))", column.SortExpresssion, column.SortExpresssion, condition.SQL)
newCondition.mergeValues(condition.Values)
case prevValue != nil && column.NullOption == Last:
// the next row could be NULL
newCondition.SQL = fmt.Sprintf(
"(((%s %s ?) OR (%s IS NULL)) OR ((%s = ?) AND %s))",
column.SortExpresssion, sign, column.SortExpresssion, column.SortExpresssion, condition.SQL,
)
newCondition.mergeValues([]interface{}{prevValue, prevValue})
newCondition.mergeValues(condition.Values)
case prevValue != nil && column.NullOption == First:
newCondition.SQL = fmt.Sprintf("(%s %s ?) OR ((%s = ?) AND %s)", column.SortExpresssion, sign, column.SortExpresssion, condition.SQL)
newCondition.mergeValues([]interface{}{prevValue, prevValue})
newCondition.mergeValues(condition.Values)
}
return newCondition
}
}
Let me break down the code into sections. The following code sections are not exactly the same as the full code for readability consideration.
Define ORDER BY
column
At first, we need to understand what defines an ORDER BY
column.
According to the doc of Postgresql, an ORDER BY
column defined by
- a sort expression. “The sort expression(s) can be any expression that would be valid in the query's select list”
- a sort direction, which is either
ASC
orDESC
- a null position, which is
NULLS FIRST
orNULLS LAST
It’s straightforward to create a struct
for order by
columns like the following
// Note the Direction field and NullOption would better be enum
type OrderByColumn struct {
SortExpresssion string
Direction string // ASC or DESC
NullOption string // FIRST or LAST
}
const (
Asc = "ASC"
Desc = "DESC"
First = "FIRST"
Last = "LAST"
)
Generate the condition for the base case which is sorting on one column
Let’s first define the condition struct
type Condition struct {
SQL string // like "A = ? AND B > ?"
Values []interface{} // like []interface{}{20, "2020-01-01"}
}
func (c *Condition) mergeValues(values []interface{}) {
if len(values) > 0 {
c.Values = append(c.Values, values...)
}
}
Then the condition for sorting one one column is
condition := Condition{SQL: "", Values: make([]interface{}, 0, len(columns))}
// `column` is of type OrderByColumn.
// `prevValue` is the value of `column` in the last record of the last page
switch {
case prevValue == nil && column.NullOption == Last:
// No value after NULL
condition.SQL = fmt.Sprintf("(%s IS NOT NULL)", column.SortExpresssion)
case prevValue == nil && column.NullOption == First:
condition.SQL = fmt.Sprintf("(%s IS NOT NULL)", column.SortExpresssion)
case prevValue != nil && column.NullOption == Last:
// the next row could be NULL
condition.SQL = fmt.Sprintf("((%s %s ?) OR (%s IS NULL))", column.SortExpresssion, sign, column.SortExpresssion)
condition.Values = []interface{}{prevValue}
case prevValue != nil && column.NullOption == First:
condition.SQL = fmt.Sprintf("(%s %s ?)", column.SortExpresssion, sign)
condition.Values = []interface{}{prevValue}
}
return condition
Once we have the condition for the columns except the first column, then we can define the condition for all columns by
condition := NextPage(columns[1:], values[1:])
var newCondition Condition
switch {
case prevValue == nil && column.NullOption == Last:
newCondition.SQL = fmt.Sprintf("((%s IS NULL) AND %s)", column.SortExpresssion, condition.SQL)
newCondition.mergeValues(condition.Values)
case prevValue == nil && column.NullOption == First:
newCondition.SQL = fmt.Sprintf("((%s IS NOT NULL) OR ((%s IS NULL) AND %s))", column.SortExpresssion, column.SortExpresssion, condition.SQL)
newCondition.mergeValues(condition.Values)
case prevValue != nil && column.NullOption == Last:
// the next row could be NULL
newCondition.SQL = fmt.Sprintf(
"(((%s %s ?) OR (%s IS NULL)) OR ((%s = ?) AND %s))",
column.SortExpresssion, sign, column.SortExpresssion, column.SortExpresssion, condition.SQL,
)
newCondition.mergeValues([]interface{}{prevValue, prevValue})
newCondition.mergeValues(condition.Values)
case prevValue != nil && column.NullOption == First:
newCondition.SQL = fmt.Sprintf("(%s %s ?) OR ((%s = ?) AND %s)", column.SortExpresssion, sign, column.SortExpresssion, condition.SQL)
newCondition.mergeValues([]interface{}{prevValue, prevValue})
newCondition.mergeValues(condition.Values)
}
return newCondition
Example usage
In the unit tests at https://github.com/xuanyuwang/go-db-examples/blob/main/pagination/pagination_test.go, you can find how the NextPage
is used.
Posted on May 21, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.