Kristijan Sedlak
Posted on October 20, 2020
This is the last of a series of articles, in which I examined in detail the Huffman algorithm and its connection to HPACK for HTTP/2. In previous ones, I explained what the Huffman algorithm is, and started to explain how the encoding process of messages works, and in the last article HPACK: Huffman translation matrix. I already started the topic of decoding and have described the first part of the decoding process.
The algorithm for decoding the canonical Huffman algorithm for HPACK is being executed based on a matrix, where the Huffman tree is shown in the form of the 2-dimensional table and is made for a specific number of bits being read at the time.
In this last article, we decided that our decoder will decode the Huffman sequence by reading 2 bits at a time. For this purpose, we created a matrix, which will enable us to reverse the coded message back to the original content.
Path | ID | SYM | LFT | 00 | 01 | 10 | 11 |
---|---|---|---|---|---|---|---|
// | 0 | - | - | 1 | 2 | 3 | 6 |
//00 | 1 | A | 0 | - | - | - | - |
//01 | 2 | B | 0 | - | - | - | - |
//10 | 2 | B | 0 | - | - | - | - |
//10 | 3 | - | - | 4 | 4 | 5 | 5 |
//100X | 4 | C | 1 | - | - | - | - |
//101X | 5 | D | 1 | - | - | - | - |
//11 | 6 | E | 0 | - | - | - | - |
As an example we are using a sequence of characters in the order: A, D, and B.
ADE = 0010101
The Huffman sequence will be decoded by reading 2 bits at a time. Every reading begins at the root symbol //
. First, we read the first two bits 00
. In line one of the matrix at ID=0
, we need to check where this code leads, or if it corresponds to any of the characters. Read bits lead to the second line with ID=1
and they represent the letter A.
The process is repeated for the next 2 bits 10
. This code leads us to the line with ID=3
which doesn’t represent a character, so we continue the process for the next 2 bits 10
. This code then leads us to line 5, representing the letter D. Here we can see that the value of the column LFT=1
, meaning that there is a leftover 1. This means that in order to continue reading bits we have to shift to one bit back and continue the process there.
We position ourselves back to the root position while keeping the last bit 0
, and keep reading until we reach the sum of 2 bits. This means that we need to read only 1 bit 1
. Code 01
corresponds with character B and with this we conclude the decoding process.
00XXXXX => A
XX10XXX => continue
XXXX10X => D
XXXXX01 => B
With the use of the translation matrix, which we created to read 2 bits at a time, we successfully decoded the Huffman sequence back into readable characters. This is how HPACK in HTTP/2 decodes header literal values. The process is optimal, while it is best for web servers to read more bits at a time. Considering that the shortest Huffman code for an individual character is 5 bits long, it’s optimal, for the best ratio between speed and used resources, to read 4 bits at a time. More bits at a time mean faster decoding but at the same time a larger translation table and with it a higher memory footprint.
I made a complete decoder implemented in Rust. It is available open-source at the public Github repository.
Posted on October 20, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.