Crash Course Redis - Redis Data Types Explained

kalkwst

Kostas Kalafatis

Posted on January 31, 2023

Crash Course Redis - Redis Data Types Explained

Redis is more than just a key-value store; it is a data structures server that accepts a variety of values. This means that, unlike traditional key-value stores, where string keys are associated with string values, Redis' value is not limited to a simple string but can also hold more complex data structures.

The following is the list of all the data structures supported by Redis, which will be covered separately:

  • Binary-safe strings.
  • Lists: Collections of string elements arranged in the order of insertion. They are essentially linked lists.
  • Sets: Collections of distinct, unsorted string elements.
  • Sorted Sets: Sorted sets are similar to sets, but each string element is associated with a floating number value known as score. Because the elements are always retrieved in order of their score, unlike Sets, it is possible to retrieve a range of elements (for example you may ask: give me the top 10, or the bottom 10).
  • Hashes: Maps made up of fields that correspond to values. The field and the value are both strings. This is very similar to hashes in Ruby or Python.
  • Bit arrays (or simply bitmaps): It is possible to treat String values like an array of bits by using special commands: you can set and clear individual bits, count all the bits set to 1, find the first set or unset bit, and so on.
  • HyperLogLogs: This is a probabilistic data structure used to calculate the cardinality of a set.

Image description

Keys

Redis keys are binary safe, which means they can be anything from a string like "foo" to the content of a JPEG file. The empty string can likewise be used as a valid key.

Here are a few more crucial rules:

  •   It's not a good idea to have very long keys. For example, a key of 1024 bytes is a bad idea not only because it takes up too much memory but also because finding the key in the dataset may require several expensive key comparisons. Even if the task is to check if a large value exists, it is better to hash it (for example, with SHA1), especially from a memory and bandwidth point of view.
  •   Most of the time, very short keys are not a good idea. It doesn't make much sense to write "u1000flw" as a key when you can write "user:1000:followers" instead. The second one is easier to read, and the extra space is small compared to how much space the key object and the value object take up. Even though it's true that short keys use less memory, it's up to you to find the right balance.
  •   Try to stick to a policy. For example, "object-type:id," like "user:1000," is a good idea. For fields with more than one word, like "comment:4321:reply.to" or "comment:4321:reply-to," a dot or a dash is often used.
  •   The maximum allowed key size is 512 MB.

Querying the key space

There are commands that are not defined for specific types but are beneficial for interacting with the keyspace and so can be used with keys of any type. For example, the EXISTS command returns 1 or 0 to indicate whether or not a certain key exists in the database, but the DEL command deletes a key and its associated value, whatever that value is.

> set mykey hello  
OK  
> exists mykey  
(integer) 1  
> del mykey  
(integer) 1  
> exists mykey  
(integer) 0  
Enter fullscreen mode Exit fullscreen mode

The examples also demonstrate how DEL returns 1 or 0 depending on whether the key was removed (it existed) or not (there was no such key with that name). There are several key space instructions, but the two listed above are the most important, together with the TYPE command, which returns the type of value stored at the specified key:

> set mykey x  
OK  
> type mykey  
string  
> del mykey  
(integer) 1  
> type mykey  
none  
Enter fullscreen mode Exit fullscreen mode

Key Expiration

Key expiration lets you establish a timeout for a key, commonly known as a "time to live" or "TTL". When the timer runs out, the key is automatically destroyed.

Consider the following points regarding key expiration:

  • They can be set in either seconds or milliseconds.
  • The expiration time resolution, on the other hand, is always 1 millisecond.
  • Information about expires is copied and stored on disk, so when your Redis server isn't running, time seems to pass (this means that Redis saves the date at which a key will expire).

To set the expiration date of a key, use the EXPIRE command:

> set key some-value  
OK  
> expire key 5  
(integer) 1  
> get key (immediately)  
"some-value"  
> get key (after some time)  
(nil)  
Enter fullscreen mode Exit fullscreen mode

The key vanished between the two GET calls, since the second call was delayed by more than 5 seconds. In the example above, we used EXPIRE in order to set the expire (it can also be used in order to set a different expire to a key already having one, like PERSIST can be used in order to remove the expire and make the key persistent forever). However, we can also create keys with expiration dates using other Redis commands. For example, using SET options:

> set key 100 ex 10  
OK  
> ttl key  
(integer) 9  
Enter fullscreen mode Exit fullscreen mode

The preceding example creates a key with the string value 100 and a timer of ten seconds. Later, the TTL command is used to determine the key's remaining life.

Strings

Redis strings store byte sequences such as text, serialized objects, and binary arrays. As a result, strings are the most fundamental Redis data type. They're commonly used for caching, but they also include extra functionality that allows you to create counters and conduct bitwise operations.

Setting and retrieving a string in Redis

> set mykey somevalue  
OK  
> get mykey  
"somevalue"  
Enter fullscreen mode Exit fullscreen mode

As you can see, the SET and GET commands are used to set and get a string value. It should be noted that SET will overwrite any previous value stored in the key if the key already exists, even if the key is associated with a non-string value. As a result, SET completes an assignment. Values can be any type of string (including binary data); for example, a jpeg image can be stored inside a value. A value cannot exceed 512 MB.

There are some interesting options for the SET command that are given as extra arguments. We could tell SET to fail if the key already exists, or we could tell it to only work if the key already exists:

> set mykey newvalue nx  
(nil)  
> set mykey newvalue xx  
OK  
> set mykey2 newvalue xx  
(nil)  
Enter fullscreen mode Exit fullscreen mode

Even though strings are the most fundamental Redis values, there are several interesting operations you can perform on them. One example is the atomic increment:

> set counter 0  
OK  
> incr counter  
(integer) 1  
> incrby counter 50  
(integer) 51  
Enter fullscreen mode Exit fullscreen mode

The INCR command converts the string value into an integer, adds one to it, and then sets the result as the new value. There are also commands like INCRBY, DECR, and DECRBY that do the same thing. It's always the same command under the hood, but it performs in a slightly different manner.

What does "INCR is atomic" mean? Even if more than one client issues INCR on the same key, there will never be a race condition. For example, it will never happen that Alice reads "10" and Bob reads "10" at the same time, both increment to 11, and both set the new value to 11. The final value will always be 12, and the read-increment-set operation is done when no other client is carrying out a command.

Most string operations are O(1), which means they're highly efficient. However, be careful with the SUBSTR, GETRANGE, and SETRANGE commands, which can be O(n). These random-access string commands may cause performance issues when dealing with large strings.

Lists

To explain the List data type, it's better to start with a little bit of theory, as the term "list" is often used in an improper way by information technology folks. For instance, "Python Lists" are not what the name may suggest (linked lists), but rather "arrays" (the same data type is called an "array" in Ruby).

From a very general point of view, a "list" is just a sequence of ordered elements: 10, 20, 1, 2, 3. But a list made with an array is very different from a list made with a linked list in terms of its properties.

Redis lists are implemented via linked lists. This means that even if you have millions of elements inside a list, the operation of adding a new element to the head or tail of the list is performed "in constant time." The speed of adding a new element with the LPUSH command to the head of a list with ten elements is the same as adding an element to the head of a list with 10 million elements.

What's the downside? Accessing an element "by index" is very fast in lists implemented with an Array (constant-time indexed access) and not so fast in lists implemented with linked lists (where the operation requires an amount of work proportional to the index of the accessed element).

Because it is critical for a database system to be able to add elements to a very long list in a very fast manner, Redis lists are implemented with linked lists. We'll see in a moment that another big benefit is that Redis lists can be accessed at the same length and time every time.

When quick access to the middle of a large collection of elements is required, a different data structure known as a "sorted set" can be used.

Adding elements to a list

The LPUSH command adds a new element to the left (at the head) of a list, whereas the RPUSH command adds a new element to the right (at the tail). Finally, the LRANGE command extracts element ranges from lists:

> rpush mylist A  
(integer) 1  
> rpush mylist B  
(integer) 2  
> lpush mylist first  
(integer) 3  
> lrange mylist 0 -1  
1) "first"  
2) "A"  
3) "B"  
Enter fullscreen mode Exit fullscreen mode

Note that LRANGE needs two indexes, one for the first element of the range to return and one for the last element. Both indexes can be negative, which tells Redis to start counting from the end. This means that -1 is the last item in the list, -2 is the second-to-last item, and so on.

As you can see, RPUSH added the elements on the right of the list, while the last LPUSH added the element on the left. Both of these commands are "variadic," which means that you can add more than one item to a list with a single call:

> rpush mylist 1 2 3 4 5 "foo bar"  
(integer) 9  
> lrange mylist 0 -1  
1) "first"  
2) "A"  
3) "B"  
4) "1"  
5) "2"  
6) "3"  
7) "4"  
8) "5"  
9) "foo bar"  
Enter fullscreen mode Exit fullscreen mode

An important operation defined on Redis lists is the ability to pop elements. Populating elements is the operation of both retrieving the element from the list and eliminating it from the list at the same time. You can pop elements from the left and right, just like you can push elements from both sides of the list:

> rpush mylist a b c  
(integer) 3  
> rpop mylist  
"c"  
> rpop mylist  
"b"  
>rpop mylist  
"a"  
Enter fullscreen mode Exit fullscreen mode

We added three elements and popped three elements, so at the end of this sequence of commands, the list is empty and there are no more elements to pop. If we try to pop yet another element, this is the result we get:

> rpop mylist  
(nil)  
Enter fullscreen mode Exit fullscreen mode

Capped lists

In many use cases, we just want to use lists to store the latest items, whatever they are: social network updates, logs, or anything else.

With the LTRIM command, we can use Redis lists as capped collections that only keep the most recent N items and throw away the rest.

The LTRIM command is similar to LRANGE, but instead of displaying the specified range of elements, it sets this range as the new list value. All the elements outside the given range are removed.

> rpush mylist 1 2 3 4 5  
(integer) 5  
> ltrim mylist 0 2  
OK  
> lrange mylist 0 -1  
1) "1"  
2) "2"  
3) "3"  
Enter fullscreen mode Exit fullscreen mode

The above LTRIM command tells Redis to only keep the list items from index 0 to 2 and to throw away everything else. This makes it possible to create a simple but useful pattern by combining a list push operation and a list trim operation to add a new element and get rid of elements that are too many:

LPUSH mylist <some element>  
LTRIM mylist 0 999  
Enter fullscreen mode Exit fullscreen mode

The preceding combination adds a new element while only including the 1,000 most recent elements in the list. With LRANGE you can access the top items without any need to remember very old data.

Blocking operations

Lists have a special feature that makes them ideal for implementing queues and, more broadly, as a building block for interprocess communication systems: blocking operations.

Assume you want to push items into a list with one process and then use a different process to do something with those items. This is the typical producer-consumer setup, and it is simple to implement:

  • To push items into the list, producers call LPUSH.
  • To extract items from the list, consumers call RPOP.

However, it is possible that the list is empty at times and there is nothing to process, in which case RPOP simply returns NULL. In this case, the consumer is forced to wait some time before attempting to use RPOP again. 

This is known as polling, and it is not a good idea in this context due to several disadvantages:

  1. Redis and clients are forced to process useless commands (all requests received when the list is empty will receive no actual work and will simply return NULL).
  2. Adds a delay to item processing because a worker waits some time after receiving a NULL. To reduce the delay, we could wait less time between RPOP calls, which would amplify problem number one, i.e. more useless Redis calls.

So Redis implements BRPOP and BLPOP commands, which are versions of RPOP and LPOP that can block if the list is empty; they'll return to the caller only when a new element is added to the list or a user-specified timeout is reached.

> brpor tasks 5  
1) "tasks"  
2) "do_something"  
Enter fullscreen mode Exit fullscreen mode

It means: "wait for elements in the list tasks, but return if after 5 seconds no element is available".

You should know that you can set the timeout to 0 to wait for elements forever, and you can specify more than one list instead of just one to wait on more than one list at once and be notified when the first list gets an element.

There are a few things to have in mind when using BRPOP:

  1. Clients are served in a certain order: the first client who blocked while waiting for a list is served first when another client pushes an element, and so on.
  2. Compared to RPOP, the return value is a two-element array that also contains the name of the key. This is because BRPOP and BLPOP can block while waiting for elements from multiple lists.
  3. If the timeout is reached, NULL is returned.

Hashes

Redis hashes are a way to use the data structure of a hash table or hash map. Hash tables map unique keys to values. Each key has its own lookup value, which is made by a hash function, so that it is quick and easy to find.

For example, a Redis hash could stand for a business's customer database. A unique key, like a customer ID, is used to find out who each customer is. The matching value could be an object that stores information like the customer's name, contact information, and a list of their most recent purchases.

Hashes in Redis are also a great way to represent objects. For example, a Redis hash can be used to represent a User object. Each key-value pair in a Redis hash has a different attribute (e.g., username, email address, age, etc.). Redis uses very little space to store hashes with fewer than 100 key-value pairs.

> hset user:1000 username spookums birthyear 1988 verified 1  
(integer) 3  
> hget user:1000 username  
"spookums"  
> hget user:1000 birthyear  
"1988"  
> hgetall user:1000  
1) "username"  
2) "spookums"  
3) "birthyear"  
4) "1988"  
5) "verified"  
6) "1"  
Enter fullscreen mode Exit fullscreen mode

Sets

Redis Sets are unsorted string collections. The SADD command is used to add new elements to a set. It is also possible to perform a variety of other operations on sets, such as determining whether a given element already exists, performing the intersection, union, or difference of multiple sets, and so on.

> sadd myset 1 2 3  
(integer) 3  
> smembers myset  
1. 3  
2. 1  
3. 2  
Enter fullscreen mode Exit fullscreen mode

We've added three elements to my set and told Redis to return them all. As you can see, they are not sorted. Because there is no contract with the user regarding element ordering, Redis is free to return the elements in any order at any time.

Redis has commands to test for membership. For example, checking if an element exists:

> sismember myset 3  
(integer) 1  
> sismember myset 30  
(integer) 0  
Enter fullscreen mode Exit fullscreen mode

"3" is a member of the set, while "30" is not.

Sets are useful for expressing object relationships. For example, we can easily use sets to implement tags. A simple way to model this problem is to create a set for each object that needs to be tagged. The IDs of the tags associated with the object are stored in the set. One example is tagging news articles. If article ID 1000 is tagged with tags 1, 2, 5, and 77, the following tag IDs can be associated with the news item:

> sadd news:1000:tags 1 2 5 77  
(integer) 4  
Enter fullscreen mode Exit fullscreen mode

We might also want to have the inverse relationship: a list of all the news tagged with a specific tag:

> sadd tag:1:news 1000  
(integer) 1  
> sadd tag:2:news 1000  
(integer) 1  
> sadd tag:5:news 1000  
(integer) 1  
> sadd tag:77:news 1000  
(integer) 1  
Enter fullscreen mode Exit fullscreen mode

To get all the tags for a given object is trivial:

> smembers news:1000:tags  
1. 5  
2. 1  
3. 77  
4. 2  
Enter fullscreen mode Exit fullscreen mode

Other non-trivial operations are still simple to implement with the right Redis commands. For example, we might want a list of all the objects that share the tags 1, 2, 10, and 27. We can accomplish this by using the SINTER command, which performs set intersection. We can employ:

> sinter tag:1:news tag:2:news tag:10:news tag:27:news  
... results here ...  
Enter fullscreen mode Exit fullscreen mode

The command to extract an element is known as SPOP, and it is useful for modeling certain problems. To implement a web-based poker game, for example, you might want to represent your deck with a set. Consider using a one-character prefix for (C)lubs, (D)iamonds, (H)earts, and (S)pades:

> sadd deck C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 CJ CQ CK  
  D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 DJ DQ DK H1 H2 H3  
  H4 H5 H6 H7 H8 H9 H10 HJ HQ HK S1 S2 S3 S4 S5 S6  
  S7 S8 S9 S10 SJ SQ SK  
  (integer) 52  
Enter fullscreen mode Exit fullscreen mode

Now we want to give each player five cards. The SPOP command removes a random element and returns it to the client, making it the ideal operation in this situation. However, if we play it directly against our deck, we'll have to repopulate the deck of cards in the next game, which may not be ideal. To begin, we can import a copy of the deck key set into the game: 1:deck key.

SUNIONSTORE is used to accomplish this, which normally performs the union of multiple sets and stores the result in another set. However, because the union of a single set is itself, we can duplicate my deck using:

> sunionstore game:1:deck deck  
(integer) 52  
Enter fullscreen mode Exit fullscreen mode

Now we can provide the first player with five cards:

> spop game:1:deck  
"C6"  
> spop game:1:deck  
"CQ"  
> spop game:1:deck  
"D1"  
> spop game:1:deck  
"CJ"  
> spop game:1:deck  
"SJ"  
Enter fullscreen mode Exit fullscreen mode

Sorted Sets

Sorted sets are a data type that resembles a cross between a Set and a Hash. Sorted sets, like sets, are composed of unique, non-repeating string elements, so they are also sets in some ways. While elements within sets are not ordered, each element in a sorted set is assigned a floating point value known as the score (this is why the type is also similar to a hash, since every element is mapped to a value). Furthermore, elements in a sorted set are ordered (so they are not ordered on request; order is a peculiarity of the data structure used to represent sorted sets). They are arranged in a specific order

Given two elements A and B:

  • If A and B have different scores, then A is greater than B, if the score of A is greater than the score of B.
  • If A and B have the same score, then A is greater than B, if the string A is lexicographically greater than B string. A and B strings can't be equal since sorted sets don't allow duplicate elements.

Below there are some notable people and their birth years as scores:

> zadd people 1940 "Alan Kay"  
(integer) 1  
> zadd people 1957 "Sophie Wilson"  
(integer) 1  
> zadd people 1953 "Richard Stallman"  
(integer) 1  
> zadd people 1949 "Anita Borg"  
(integer) 1  
> zadd people 1965 "Yukihiro Matsumoto"  
(integer) 1  
> zadd people 1914 "Hedy Lamarr"  
(integer) 1  
> zadd people 1916 "Claude Shannon"  
(integer) 1  
> zadd people 1969 "Linus Torvalds"  
(integer) 1  
> zadd people 1912 "Alan Turing"  
(integer) 1  
Enter fullscreen mode Exit fullscreen mode

As you can see, ZADD is similar to SADD, but it adds one more argument (the score) before the element to be added. ZADD is also variadic, so you can specify multiple score-value pairs, though this isn't done in the example above. Returning a list of people sorted by birth year is trivial with sorted sets because they are already sorted.

Note on implementation: Sorted sets are implemented using a dual-ported data structure that contains both a skip list and a hash table, so Redis performs an O(log(N)) operation every time we add an element. That's fine, but when we ask for sorted elements, Redis doesn't have to do anything because everything is already sorted:

> zrange people 0 -1  
1) "Alan Turing"  
2) "Hedy Lamarr"  
3) "Claude Shannon"  
4) "Alan Kay"  
5) "Anita Borg"  
6) "Richard Stallman"  
7) "Sophie Wilson"  
8) "Yukihiro Matsumoto"  
9) "Linus Torvalds"  
Enter fullscreen mode Exit fullscreen mode

It is also possible to return scores by using the WITHSCORES argument:

> zrange people 0 -1 withscores  
1) "Alan Turing"  
2) "1912"  
3) "Hedy Lamarr"  
4) "1914"  
5) "Claude Shannon"  
6) "1916"  
7) "Alan Kay"  
8) "1940"  
9) "Anita Borg"  
10) "1949"  
11) "Richard Stallman"  
12) "1953"  
13) "Sophie Wilson"  
14) "1957"  
15) "Yukihiro Matsumoto"  
16) "1965"  
17) "Linus Torvalds"  
18) "1969"  
Enter fullscreen mode Exit fullscreen mode

Operating on ranges

Sorted sets are more powerful than this. They can work on ranges. Let's start with everyone born between 1900 and 1950. To accomplish this, we use the ZRANGEBYSCORE command:

> zrangebyscore people -inf 1950  
1) "Alan Turing"  
2) "Hedy Lamarr"  
3) "Claude Shannon"  
4) "Alan Kay"  
5) "Anita Borg"  
Enter fullscreen mode Exit fullscreen mode

It is also possible to delete element ranges. Let's remove from the sorted set all the hackers born between 1940 and 1960:

> zremrangebyscore people 1940 1960  
(integer) 4  
Enter fullscreen mode Exit fullscreen mode

Lexicographical scores

With Redis 2.8, a new feature was introduced that allows getting ranges lexicographically, assuming elements in a sorted set are all inserted with the same identical score (elements are compared with the C memcmp function, so there is no collation and every Redis instance will respond with the same output). The main lexicographical range commands are ZRANGEBYLEX, ZREVRANGEBYLEX, ZREMRANGEBYLEX, and ZLEXCOUNT.

For example, let's add again our list of famous hackers, but this time use a score of zero for all the elements:

> zadd hackers 0 "Alan Kay" 0 "Sophie Wilson" 0 "Richard Stallman" 0
  "Anita Borg" 0 "Yukihiro Matsumoto" 0 "Hedy Lamarr" 0 "Claude Shannon"
  0 "Linus Torvalds" 0 "Alan Turing"
Enter fullscreen mode Exit fullscreen mode

Because of the sorted sets ordering rules, they are already sorted lexicographically:

> zrange hackers 0 -1
1) "Alan Kay"
2) "Alan Turing"
3) "Anita Borg"
4) "Claude Shannon"
5) "Hedy Lamarr"
6) "Linus Torvalds"
7) "Richard Stallman"
8) "Sophie Wilson"
9) "Yukihiro Matsumoto"
Enter fullscreen mode Exit fullscreen mode

Using ZRANGEBYLEX we can ask for lexicographical ranges:

> zrangebylex hackers [B [P
1) "Claude Shannon"
2) "Hedy Lamarr"
3) "Linus Torvalds"
Enter fullscreen mode Exit fullscreen mode

This feature is useful because it enables us to use sorted sets as a generic index. If you want to index elements by a 128-bit unsigned integer argument, for example, all you have to do is add elements to a sorted set with the same score (say 0), but with a 16-byte prefix consisting of the 128-bit number in big-endian. Because numbers in big-endian are numerically ordered when ordered lexicographically (in raw bytes order), you can ask for ranges in the 128-bit space and get the element's value without the prefix.

Bitmaps

Bitmaps are not a data type in and of themselves but rather a set of bit-oriented operations defined on the String type. Because strings are binary safe blobs with a maximum length of 512 MB, they can be used to set up to 232 different bits. Bit operations are classified into two types: constant-time single bit operations, such as setting a bit to 1 or 0 or retrieving its value, and operations on groups of bits, such as counting the number of set bits in a given range of bits (e.g., population counting).

One of the most significant advantages of bitmaps is that they frequently provide significant space savings when storing information. In a system where different users are represented by incremental user IDs, for example, 512 MB of memory can be used to remember a single bit of information (for example, whether a user wants to receive a newsletter) for 4 billion users.

Bits are set and retrieved using the SETBIT and GETBIT commands:

> setbit key 10 1
(integer) 0
> getbit key 10
(integer) 1
> getbit key 11
(integer) 0
Enter fullscreen mode Exit fullscreen mode

The SETBIT command takes two arguments: the bit number and the value to set the bit to, which can be either 1 or 0. If the addressed bit is outside the current string length, the command automatically enlarges the string.

GETBIT simply returns the bit value at the specified index. Out-of-range bits (addressing a bit that is longer than the length of the string stored in the target key) are always treated as zero.

There are three commands operating on a group of bits:

  • BITOP lets you do bit-wise operations between two strings. AND, OR, XOR, and NOT are the operations that can be used.
  • BITCOUNT counts the number of bits set to 1 and reports the result.
  • BITPOS looks for the first bit with the specified value of 0 or 1. BITPOS and BITCOUNT can both operate on byte ranges of a string rather than the entire length of the string.
> setbit key 0 1
(integer) 0
> setbit key 100 1
(integer) 0
> bitcount key
(integer) 2
Enter fullscreen mode Exit fullscreen mode

Assume you want to know which of your website's users has the longest streak of daily visits. You begin counting days from zero, the day you made your website public, and set a bit with SETBIT every time a user visits the site. Simply take the current unix time, subtract the initial offset, and divide by the number of seconds in a day (normally, 3600 * 24) to get a bit index. This way, you have a small string containing the visit information for each day for each user. The number of days a given user visited the website can be easily obtained using BITCOUNT, while the longest streak can be easily computed using a few BITPOS calls or simply by fetching and analyzing the bitmap client-side.

Bitmaps are easily split into multiple keys, for example, for the purpose of sharding the data set and because it is preferable to avoid working with large keys in general. Instead of storing all of the bits in one key, a simple strategy is to store M bits per key and obtain the key name with bit-number/M and the Nth bit to address inside the key withbit-number MOD M.

HyperLogLogs

A HyperLogLog is a probabilistic data structure that is used to count the number of unique things (technically this is referred to estimating the cardinality of a set). Counting unique items typically necessitates the use of memory proportional to the number of items to be counted, because you must remember the elements you have already seen in the past in order to avoid counting them multiple times. However, there are a set of algorithms that trade memory for precision: you end up with an estimated measure with a standard error that is less than 1% in the case of the Redis implementation. The magic of this algorithm is that you can use a constant amount of memory instead of a proportional amount of memory proportional to the number of items counted! In the worst case, 12k bytes, or much less if your HyperLogLog (we'll just call them HLL from now on) has seen very few elements.

While HLLs are technically a different data structure in Redis, they are encoded as a Redis string, so you can use GET to serialize one and SET to deserialize it back to the server. The HLL API is conceptually equivalent to using Sets to accomplish the same task. You would SADD every observed element into a set and then use SCARD to count the number of unique elements within the set, as SADD will not re-add an existing element.

While you don't really add items into an HLL because the data structure only contains a state that does not include actual elements, the API is the same:

  • When you see a new element, use PFADD to add it to the count.
  • You can use PFCOUNT to get a rough idea of how many unique elements have been added so far with PFADD.
  > pfadd hll a b c d
  (integer) 1
  > pfcount hll
  (integer) 4
Enter fullscreen mode Exit fullscreen mode

One use case for this data structure is counting the number of unique queries entered into a search form by users each day.

Conclusion

Redis provides a diverse set of data types that enable efficient storage and retrieval of various types of data. Redis is a flexible and versatile data storage solution that supports everything from strings, hashes, and lists to sets, sorted sets, and geospatial indexes. Understanding the various data types and their use cases is critical for effectively using Redis in your applications.

In the next post in this series, we'll delve into the world of Redis architectures and look at how Redis can be deployed and configured to meet the needs of your applications. We will cover everything you need to know about building a robust and scalable Redis infrastructure, from master-slave configurations to sentinel and cluster architectures. So stay tuned for the next post and come along with us as we delve deeper into the world of Redis.

💖 💪 🙅 🚩
kalkwst
Kostas Kalafatis

Posted on January 31, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related