Tariq Abughofa
Posted on November 13, 2019
Many ways to scroll through a table. Easy to misuse them.
Server-side pagination is a commonly-used feature in SQL databases. It helps when showing a huge set of results on user interfaces, it's required in RESTful APIs, and it comes to the rescue whenever you need to process large data in bulk and it doesn't fit in memory. The problem is that if it's done wrong it can be as inefficient as loading the full set or more.
There is a number of ways to implement pagination with SQL systems. I will go through the methods in this article with the advantages, disadvantages, and the reasonable scenarios of using each one.
Client-side Pagination:
The requested query is passed as it is without pagination. After the client gets the full set of data, it divides the data and shows it paginated to the user.
Advantages
It reduces the number of requests to the database. The data can even be cached on the client-side to avoid future requests which makes later loads even faster.
Disadvantages
Not suitable for regularly changing data. The dataset should be small to fit in memory and not painfully slow the initial load.
Suitable Scenarios
Small data sets which are not regularly updated and needed frequently such as the categories.
Using Limit/Offset:
The limit
and offset
are standard SQL keywords and the first solution that comes to mind for pagination over datasets. The query syntax would look something like this:
SELECT *
FROM products
ORDER BY sale_date DESC
WHERE sold = TRUE
LIMIT 10 OFFSET 30;
Advantages
Easy to implement which made many famous ORM solution use it. It's as easy as chaining the query function like this:
Product.limit(10).offset(30).findAll()
It also allows you to filter the table while pagination with the WHERE
statement. You can sort on any column and it still works. Which makes the query very customizable. Not the whole data is loaded into memory which means no running out of memory.
Disadvantages
It can be HORRIBLE. The query performance goes downhill as the offset value increases. Let's say you are fetching page n
. How would offset skip the first n-1
? It wouldn't. It has to linearly scan the table for the first n-1
and then load page n
.
An offset means that a certain number of records will be skipped from the start, but what if new data where inserted in page n-1
while page n
is being loaded? rows from page n-1
will be pushed into page n
causing inconsistency, and worst; if the data is being updated by the paginating process, rows might be processed multiple times causing data inconsistency.
Suitable Scenarios
Great for user interfaces, easy to implement and very customizable to different filter/order preferences. As long as the deepness of the search results is limited. Perfect for pages where users paginate down the results with scrolling. Have this functionality on a large dataset with a "last page" button showing up on the interface and embrace yourself for the a database server out of service.
For migrations and large dataset processing, my advice: Offset the offset
statement.
Cursors
SQL cursors makes the server do the pagination for you. All what you need to worry about is the query you want to paginate. It looks something like this:
-- Create a cursor for the query and open it
DECLARE cur CURSOR FOR SELECT * FROM products;
OPEN curEmail;
-- Retrieve ten rows each time
FETCH 10 FROM cur;
FETCH 10 FROM cur;
-- All done
CLOSE cur;
Advantages
Supports arbitrary queries. No performance drop when going further in the pages.
Disadvantages
the implementation details is different from an SQL engine to the other but in general they held resources on the server and create locks. Clients can't share cursors and thus they would have to open each it's own which mean they can't share the same pagination.
Suitable Scenarios
One client which needs to paginate over large-sets of data and cares about pagination consistency. There are many types that can suite different applications: READ_ONLY
which avoids locks on the table, STATIC
which copies the result into a temporary table which is good when updates doesn't matter, KEYSET
which only copies the primary keys to provide you with updates but not with new rows, DYNAMIC
same as KEYSET
but primary keys copy is updated so it can also see newly inserted and deleted rows. ...etc. the availability of these solutions depends on the engine.
Key-based Pagination
This technique is my favourite for data migrations. You need an indexed column to use it but that introduces great optimization while staying stateless on the server side.
The first page is fetched with the following statement:
SELECT *
FROM products
ORDER BY id ASC
LIMIT 1000;
For the second page, the query uses the maximum id
value fetched in the first page. Let's say it's 1000, the query is as follows:
SELECT *
FROM products
WHERE id > 1000
ORDER BY id ASC
LIMIT 1000;
An so on for the next pages.
Advantages
Offers great scalability: It's as fast for the 1,000,000th page as for the first page. Also pagination consistency is preserved. It supports filtering and ordering on any column or multiple columns as long as you have indices on them.
Disadvantages
In general, there is no way to jump to a specific page. However, If you have an auto incremented identifier in the dataset, that can be done with some simple calculations. To retrieve page n
, the lower bound for the query would be like this: n * 1000
where 1000 is the page size or the limit.
Suitable Scenarios
Mostly any one. Very convenient for scalable applications where a lot of server requests are expected.
Posted on November 13, 2019
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.