Scaling Real-Time Leaderboards with Dragonfly
Dragonfly
Posted on January 19, 2024
Introduction
In today's digital age, leaderboards have become an integral part of many applications, providing a dynamic way to display user scores and rankings.
To build gamification features for any application (i.e., games, educational platforms), leaderboards serve as a powerful tool to engage and motivate users.
In this blog post, we're going to delve into the process of building a practical and realistic leaderboard system.
Our journey will involve leveraging the capabilities of Dragonfly, a highly efficient drop-in replacement for Redis,
known for its ultra-high throughput and multi-threaded share-nothing architecture.
Specifically, we'll be utilizing two of Dragonfly's data types: Sorted-Set
and Hash
.
These data structures are perfect for handling real-time data and ranking systems, making them ideal for our leaderboards.
Moreover, to ensure that our leaderboards are not just real-time but also persistent, we will be integrating a SQL database (PostgreSQL) into our system.
This approach allows us to maintain a comprehensive record of user scores over different time frames.
As a result, we'll be capable of showcasing three distinct types of leaderboards:
- An all-time leaderboard that reflects overall user scores.
- A current-week leaderboard that captures the most recent user activities.
- Leaderboards for previous weeks, giving users insights into past trends and performances, potentially also providing rewards and prizes for top performers.
Through this implementation, we aim to demonstrate how Dragonfly, in conjunction with traditional SQL databases,
can be utilized to create robust, scalable, and efficient leaderboard systems. So, let's dive in and start building!
Implementation
1. Database Schema
In the implementation of our leaderboard system, a carefully designed SQL database schema plays a pivotal role.
At the core of this schema is the users
table, which is essential for storing basic user information.
This table includes fields like id
(a unique identifier for each user, automatically incremented as BIGSERIAL
),
email
(a unique field to prevent duplicate registrations), password
, username
,
and timestamps created_at
and updated_at
to track the creation and last update of each user record.
Note that the password
field should store the hashed or encrypted version of the user's password for security purposes.
CREATE TABLE IF NOT EXISTS users
(
id BIGSERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
password VARCHAR(255) NOT NULL,
username VARCHAR(255) NOT NULL DEFAULT '',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
Next, we have the user_score_transactions
table, which logs all score transactions for users.
It consists of an id
as a unique transaction identifier, user_id
linking to the users table,
score_added
representing the score change, reason
for the score change (such as winning a game or completing a task),
and a created_at
timestamp for the transaction record.
CREATE TABLE IF NOT EXISTS user_score_transactions
(
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users (id),
score_added INT NOT NULL,
reason VARCHAR(255) NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
Finally, the user_total_scores
table is dedicated to maintaining the cumulative scores of each user.
It contains an id
for each record, user_id
to reference the users table, total_score
indicating the user's overall score,
and an updated_at
timestamp for the last score update.
CREATE TABLE IF NOT EXISTS user_total_scores
(
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users (id),
total_score INT NOT NULL DEFAULT 0,
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
This schema is particularly effective due to its emphasis on normalization, which reduces redundancy by segregating user
information, score transactions, and total scores into distinct tables.
It ensures scalability with the use of BIGSERIAL
and BIGINT
data types, accommodating a large volume of records.
Additionally, the separate user_score_transactions
table offers valuable insights into the score history for each user,
which is beneficial for analytics and audit trails. We will also create materialized views to further support leaderboards for previous weeks as we will see later.
By isolating the total scores in the user_total_scores
table, the system can swiftly access and update a user's total score, enhancing performance.
This well-structured schema thus forms the backbone of our leaderboard system, supporting both real-time updates and a comprehensive score history.
2. Dragonfly Keys & Data Types
With the database schema in place, we can now focus on the Dragonfly key-value pairs that will be used to store the leaderboard data.
The Sorted-Set
data type is ideal for storing user scores and rankings, while the Hash
data type is perfect for storing user information that is needed for display purposes.
Here are the keys and data types that we will be using:
-
leaderboard:user_scores:all_time
(Sorted-Set): Stores the user IDs and scores for the all-time leaderboard. -
leaderboard:user_scores:week_of_{monday_of_the_week}
(Sorted-Set): Stores the user IDs and scores for a specific week. -
leaderboard:users:{user_id}
(HASH): Stores the user information for a specific user.
An example of the key space would look like this:
dragonfly$> KEYS leaderboard:*
1) "leaderboard:user_scores:all_time" # Sorted-Set
2) "leaderboard:user_scores:week_of_2024_01_15" # Sorted-Set
3) "leaderboard:users:1" # Hash
4) "leaderboard:users:2" # Hash
5) "leaderboard:users:3" # Hash
6) ...
3. All-Time & Current-Week Leaderboards
In the implementation of the all-time leaderboard and current-week leaderboard, we focus on how scores are updated for a user and how the top 100 users are queried from these leaderboards.
To update scores, we first record the score transaction in the user_score_transactions
table and then update the user_total_scores
table.
This operation should be wrapped in a database transaction to ensure data integrity.
BEGIN;
-- Record score transaction for user with ID 1.
INSERT INTO user_score_transactions (user_id, score_added, reason)
VALUES (1, 100, 'WINNING_A_GAME');
-- Update total score for user with ID 1.
UPDATE user_total_scores
SET total_score = total_score + 100,
updated_at = NOW()
WHERE user_id = 1;
COMMIT;
Next, we update the all-time leaderboard and current-week leaderboard in Dragonfly.
Note that the operations are better pipelined to reduce the number of round-trips between the application and Dragonfly.
dragonfly$> ZINCRBY leaderboard:user_scores:all_time 100 1
dragonfly$> ZINCRBY leaderboard:user_scores:week_of_2024_01_15 100 1
Now that we have persisted with the score change in the database and updated the values in Dragonfly as well,
when querying the top 100 users from a leaderboard (all-time or current-week), we can simply use the ZREVRANGE
command
to retrieve the top users from the Sorted-Set
, and then use the HGETALL
commands to retrieve user details from the Hash
keys.
dragonfly$> ZREVRANGE leaderboard:user_scores:all_time 0 99 WITHSCORES
1) "1" # user_id = 1
2) "1000" # score for user_id = 1
3) "2" # user_id = 2
4) "900" # score for user_id = 2
5) "3"
6) "800"
7) "4"
8) "700"
9) "5"
10) "600"
# ...
dragonfly$> HGETALL leaderboard:users:1
dragonfly$> HGETALL leaderboard:users:2
dragonfly$> HGETALL leaderboard:users:3
dragonfly$> HGETALL leaderboard:users:4
dragonfly$> HGETALL leaderboard:users:5
# ...
Depending on how many users are recorded in the leaderboard:user_scores:all_time
key,
we need to use 1 ZREVRANGE
command and potentially 100 HGETALL
commands to retrieve the top users.
This may sound like a lot of commands, but once again, we can pipeline these commands to reduce the number of round-trips between the application and Dragonfly.
In fact, the top user scores with their details can be retrieved in a single round-trip, and the response time should still be within a few milliseconds.
On the other hand, we completely avoid the need to query the database for the top users, which is a much more expensive operation.
This is why we are confident in saying that Dragonfly is providing a real-time experience for leaderboard retrieval.
4. Leaderboards for Previous Weeks
For the implementation of leaderboards for previous weeks, we adopted a strategy that efficiently balances database querying with caching.
The process involves two main steps: creating materialized views and leveraging Dragonfly's caching capabilities.
We utilize the user_score_transactions
table to generate materialized views for each past week's leaderboard.
Materialized views are essentially snapshots of the query results, stored for efficient access.
These views are created by aggregating the scores from the user_score_transactions
table for each user over a specific week.
An example SQL statement to create a materialized view for a specific week might look like this:
CREATE MATERIALIZED VIEW leaderboard_week_of_2024_01_15 AS
SELECT u.id, u.username, u.email, sum(ust.score_added) AS weekly_score
FROM user_score_transactions ust
JOIN users u ON ust.user_id = u.id
WHERE ust.created_at BETWEEN '2024-01-15 00:00:00' AND '2024-01-21 23:59:59'
GROUP BY u.id
ORDER BY weekly_score DESC;
Once the materialized view for a week's leaderboard is created, we can cache its results in Dragonfly to facilitate quick retrieval.
We utilize Dragonfly's String
data type to store the serialized form of the leaderboard, which can be in JSON, XML, or any other format.
The reason is that past leaderboards cannot be changed anymore, and the order is preserved in the materialized view, so we can simply cache the results as-is.
SELECT * FROM leaderboard_week_of_2024_01_15 LIMIT 100;
dragonfly$> SET leaderboard:cache_top_100:week_of_2024_01_15 'serialized_leaderboard_data'
Other Considerations
1. Calculating the Start of the Week
For the weekly leaderboards, it's essential to have a consistent method to determine the start of each week, commonly set as Monday.
This calculation is vital because it impacts both the naming conventions of keys in Dragonfly and the logic for creating and refreshing materialized views in the database.
Implementing helper methods in the application code that accurately calculate the Monday of any given week is necessary.
This consistency ensures that both the database views and the Dragonfly keys are synchronized in terms of the time periods they represent.
Such an implementation in Go might look like this:
// MondayOfTime returns the Monday of the week of the given time.
func MondayOfTime(ts time.Time) time.Time {
tt := ts.UTC()
weekday := tt.Weekday()
if weekday == time.Monday {
return tt.Truncate(24 * time.Hour)
}
daysToSubtract := (weekday - time.Monday + 7) % 7
return tt.AddDate(0, 0, -int(daysToSubtract)).Truncate(24 * time.Hour)
}
// MondayOfTimeStr returns the Monday of the week of the given time in string format.
func MondayOfTimeStr(ts time.Time) string {
return MondayOfTime(ts).Format("2006_01_02")
}
2. Management of Dragonfly Keys
The all-time leaderboard data, represented by a Sorted-Set
key in Dragonfly, is a long-term data set that can be kept indefinitely.
This key does not require an expiration as it continuously accumulates user scores over time.
Conversely, the current-week Sorted-Set
key in Dragonfly should be managed with an expiration policy.
Setting an expiry time point for this key, preferably at the beginning of the next week, ensures that the data does not become stale and reflects only the current week's scores.
This practice helps in maintaining the relevance and accuracy of the current-week leaderboard.
And finally, the user-detail Hash
keys in Dragonfly, shared across all-time and current-week leaderboards, can also be kept indefinitely.
However, it's crucial to keep the data in these user-detail Hash
keys up-to-date with the corresponding records in the database.
Whenever a user's details change in the database, these changes should be promptly reflected in the Hash keys in Dragonfly.
This synchronization ensures that the leaderboards always display the most current and accurate user information.
3. Key Naming Conventions
It's important to adopt a clear and distinct naming convention for different types of data stored in Dragonfly.
Specifically, the key names for the current-week Sorted-Set
and the cached materialized view (String
data type) should be different to prevent confusion.
A clear naming strategy helps avoid accidental operations on the wrong Dragonfly data type.
Conclusion
In this blog, we explored how Dragonfly can be used in conjunction with a SQL database to build a robust and efficient leaderboard system for gaming and other applications.
We discussed various data types and techniques that can be easily utilized to create real-time leaderboards with minimal update and retrieval latency.
We have a recorded workshop session, "Scaling Real-Time Leaderboards", that you can watch here.
Code snippets in this blog post can be found in the Dragonfly examples repository.
Finally, we encourage you to try Dragonfly out for yourself,
experience its capabilities firsthand, and build amazing applications with it!
Posted on January 19, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.