Diary of youtube-dl internals, part 2

zenulabidin

Ali Sherief

Posted on November 9, 2020

Diary of youtube-dl internals, part 2

Welcome to the second part of my youtube-dl diary. In this part, I will explore a different file in the codebase, /aes.py, which contains basic AES encryption and decryption functions.

I might do a full write-up on AES encryption and decryption in a future post. My apologies if this post feels like a brain dump, it's not in line with my usual writing style.

AES is a cryptographic standard written by the NIST in 2001, that uses a a large cipher to encrypt and decrypt arbitrary data. It doesn't encrypt data once, but 10, 12, or 14 times, depending on the length of the cipher. Decryption is done a similar number of times. If you've ever heard of AES256, that is the AES variant that encrypts data in 14 rounds. It uses a 256-bit key, larger than all other existing variants, and is the one implemented in youtube-dl and the one I will cover here.

Without wasting any time, let's cover the functions.

aes.py

This file is short and straight forward so line-by-line explanations are not needed here. BLOCK_SIZE_BYTES = 16 is the length of each chunk of data that's encrypted at a time. It means that input text is split into 16 byte (equals 128 bits) arrays and are encrypted one at a time.

The following functions are defined here:

  • aes_ctr_decrypt(data, key, counter) [EXPORTED]
  • aes_cbc_decrypt(data, key, iv) [EXPORTED]
  • aes_cbc_encrypt(data, key, iv) [1]
  • key_expansion(data) [EXPORTED]
  • aes_encrypt(data, expanded_key) [EXPORTED]
  • aes_decrypt(data, expanded_key) [2]
  • aes_decrypt_text(data, password, key_size_bytes) [EXPORTED]

[1] For some reason this function is not exported, but aes_cbc_decrypt is exported. Nothing stops you from importing this function directly though. It encrypts data using CBC, using just a cipher key and IV (initialization vector). Don't worry if you don't understand what those mean.

[2] This just decrypts one block of data, so it is not useful by itself.

The rest of the functions define operations that are done during each round of encryption (mix columns by doing a fancy matrix multiplication, subtract bytes, rotate by shifting columns left and wrapping around, xor, and the inverse of these operations). Since this post is not a commentary on the AES standard, and none of those functions are exported, I will skip them in this post.

All of the large arrays after line 205 are the coefficients for each of the encryption operations. They won't be covered here because they are so big they are virtually machine-readable only. Do not attempt to memorize the coefficient arrays, there's no benefit in doing so.

This file and these functions will likely not be changed because they are website-agnostic; they have nothing to do with downloading videos, they are just implementations of AES encryption of decryption. No pull request work is expected on these functions. Thus, don't worry if you don't understand the function bodies. As long as you know what the functions do, you can safely skip to next week's post.


aes_ctr_decrypt(data, key, counter)

Decrypts a data string using a 16/24/32-byte key, and an internal Counter class which is just a random integer for the counter that has an increment function that adds 1 to it, wrapping it to zero if it hits a maximum number.

Returns the decrypted string.

aes_cbc_decrypt(data, key, iv)

Same as aes_ctr_decrypt except instead of a Counter object, we have a 16-byte initial vector (IV) that is usually 0. It is used to derive the cipher (another buffer) for the first block, which is then used to derive the cipher for the second block etc.

aes_cbc_encrypt(data, key, iv)

The opposite of aes_cbc_decrypt, it takes plain-text data, a key and IV and returns encrypted data.

key_expansion(data)

This takes the key argument from the functions above, and expands the key into a larger sized key, around the order of 240 bytes if the key was originally 32 bytes.

The expanded key is used to perform the AES encryption operations on each block.

aes_encrypt(data, expanded_key)

This is the internal function that encrypts each block. data is a block of 16 bytes that will have the Subtract Bytes, Shift Rows, and Mix Columns operations for a specific number of rounds, essentially encrypting that block.

aes_decrypt(data, expanded_key)

Does the opposite of the aes_encrypt, that is, it decrypts a single block by applying the inverse of the operations I listed above in reverse order.

aes_decrypt_text(data, password, key_size_bytes)

This is how you'd decrypt a data block using a password. First the cipher key is created by encrypting the password, so that the encrypted password is the cipher key. Then a counter is created internally and the data is decrypted using the cipher key and aes_ctr_decrypt.

Addendum

Phew! Most of the above probably didn't make any sense to you, even I'm not sure if I'm 100% correct in some places. But the most important lesson I want you to take home, is that youtube-dl has AES functions for encryption and decryption. Why does it need to encrypt and decrypt data? After all, it's just a video downloader. Well hopefully we will find out eventually.

I was surprised with how complex the explanation for AES algorithms began. This must explain why only a few developers know the algorithm, there is lots of (important because its used in security) math that's applied here and even I had some trouble following some places. So if you see any errors in this post, don't be afraid to call them out. That's what I usually write anyway, but it's especially important in this post because of the nature of this topic, encryption.

It's important that more people understand how encryption works.

💖 💪 🙅 🚩
zenulabidin
Ali Sherief

Posted on November 9, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

How to Use KitOps with MLflow
beginners How to Use KitOps with MLflow

November 29, 2024

Configure python file in vscode
undefined Configure python file in vscode

November 30, 2024

Configure python file in vscode
undefined Configure python file in vscode

November 30, 2024