This post is a quick overview of how I turned an initially poor solution to a problem into a better one, and the thought process behind it.

The problem

I'm working on pulling information out of the Certificate Transparency Logs. The logs are made up of a JSON list of log entries. A single entry is a JSON object with two fields, leaf_input and extra_data. The leaf_input field contains base64 encoded binary data. I want to pull information out of this binary data.

To outline the structure of the binary data I'll use the following notation, indicating byte numbers:

[n] - indicates byte n, so [0] is the first byte.
[n..=m] - indicates a range of bytes, inclusive, so [2..=4] means bytes 2, 3 and 4.
[n..] - indicates the remaining bytes in the binary data after and including index n.

[0] [1] [2..=9] [10..=11] [12..=14] [15..]
 |   |     |        |         |      |
 |   |     |        |         |      |- (rest) the rest of the bytes
 |   |     |        |         |      
 |   |     |        |         |- (length) the length of the certificate      
 |   |     |        |               
 |   |     |        | - the log entry type, not relevant in this post              
 |   |     |                       
 |   |     | - timestamp, not relevant in this post                       
 |   |                            
 |   | - signature type, not relevant in this post                       
 |                               
 | - version, not relevant in this post

The relevant parts here are the length (bytes [12..=14]) and the rest (indicated by [15..] above). rest contains an X.509 certificate we want, and the three bytes in the length define an integer which tells us how many bytes that certificate makes up in rest. Once we read length bytes from rest, the remainder of rest is data that we do not need.

A solution

So we need to turn the three bytes which make up length into an integer. An integer whose size is 3 bytes would be a 24-bit int. Rust does not support 24-bit ints. The closest is a u32, a 32-bit unsigned int. You can't just turn 3 bytes into a u32 however. The first solution I came up with was something like the following code.

// "bytes" is the full binary data inside the "leaf_input" field
fn get_cert_length(bytes: &[u8]) -> u32 {
    let length_bytes_slice = &bytes[12..=14];
    let mut length_bytes = b.iter().cloned().collect::<Vec<u8>>();
    length_bytes.reverse();
    length_bytes.push(0);
    return u32::from_le_bytes(length_bytes.as_slice().try_into().unwrap());
}

This is not great! We allocate a new Vec and we have to pad it with a zero to make it 4 bytes in length. Because we cannot push a value onto the start of a Vec we first reverse it. Then we convert it into an array of type [u8; 4] - this is what length_bytes.as_slice().try_into().unwrap() does. Finally, we read the bytes in little endian format as a u32. Way too much happening here for what on the surface seems like a simple conversion!

I thought about it some more and came up with a better solution.

A better solution

First, here is an outline of my thought process. Let's say bytes 12 to 14 look as follows, in hex notation:

[00, 05, 4c]

Then the value for the length is 00054c in hex, which is 1356 in decimal. 00054c is 000000 + 000500 + 00004c. If we take the bytes 00, 05 and 4c individually, how do we get 00054c?

Well, each time we move a number in the hex string one space to the left, we multiply the number by a power of 16. For example, moving 000005, which is 5 in decimal, to 000050, which is 80 in decimal, we had to multiply 5 by 16. If we move it two places, we would need to multiply it by 256, which is 16 squared, so 000005 to 000500 (1280 in decimal) is 5 * 256. Concretely, to convert 000500 hex to decimal:

\begin{aligned} 000500 &= (0 * 16 ^5) + (0 * 16 ^4) + (0 * 16 ^3) + (5 * 16 ^2) + (0 * 16 ^1) + (0 * 16 ^0) \\ &= 5 * 16^2 \\ &= 1280 \end{aligned}

With this knowledge we can update get_cert_length as follows:

fn get_cert_length(bytes: &[u8]) -> u32 {
    (bytes[12] as u32 * 65536) + (bytes[13] as u32 * 256) + bytes[14] as u32
}

Much cleaner than the original version! The values we multiply each byte with are always a power of 2 so we can also use bit-shifts instead of the multiplications:

fn get_cert_length(bytes: &[u8]) -> u32 {
    ((bytes[12] as u32) << 16) + ((bytes[13] as u32) << 8) + bytes[14] as u32
}

I ended up using the bit-shift version in the end.

Blog

Three bytes to an integer

wayofthepie

The problem

A solution

A better solution

Join Our Newsletter. No Spam, Only the good stuff.

Related