Decoding Binary - 3 Different Ways

Hey everyone! I recently saw this tweet:

anildash

@anildash

01110010 01100101 01101101 01100101 01101101 01100010 01100101 01110010 00100000 01101100 01100101 01100001 01110010 01101110 01101001 01101110 01100111 00100000 01100010 01101001 01101110 01100001 01110010 01111001 00111111 twitter.com/kelseyhightowe…

19:50 PM - 09 Feb 2022

Kelsey Hightower @kelseyhightower
Remember when we all learned Perl. https://t.co/rFLs2jqIMu

Obviously as a programmer, this is what everyone thinks I do - talk about stuff in 1s and 0s. Sadly, I didn't know what this said, so it's time to fix that.

First, I tried figuring out how to convert binary to text in Rust. After a short stint of searching on the combined intelligence of the entire world, I discovered the following useful commands: u8::from_str_radix and String::from_utf8

Now using these commands isn't super intuitive - in the docs for u8::from_str_radix, this is true:

assert_eq!(u8::from_str_radix("A", 16), Ok(10));

A quick trip through man ascii in my terminal revealed "A" to be 41 in base 16, 65 in base 10, and 101 in base 8. But no 10! Ignoring this oddity (maybe u8 tables aren't in man ascii?), you can then chain the output from from_str_radix into from::utf8 to get out a human-readable character.

So, after some more interrogation of Stack Overflow and the friendly and intelligent Rust community, I finally got this code:

    assert_eq!(
        String::from_utf8(vec![u8::from_str_radix(&"01000001", 2).unwrap()]),
        Ok(String::from("A"))
    )

So what's important here? We first make sure the radix (base) of our input is 2, and then because String::from_<any byte type> only accepts vectors, we use the vector macro vec! to ensure our input is a vector, and then finally feed it into String::from to get out something readable.

And because from_str_radix puts out a result, and we're sure that our input is going to be valid, we can safely unwrap it to use the result as our byte.

Great! The hard part is done - all I need to do now is to loop through the tweet's content, feed the words into my script here, and then collect together the resulting bytes and join them together. I won't give a full explanation, but in short map performs a function on every element of an iterator and stores the result to be re-assembled into another array.

fn main() {
    let a = "01110010 01100101 01101101 01100101 01101101 01100010 01100101 01110010 00100000 01101100 01100101 01100001 01110010 01101110 01101001 01101110 01100111 00100000 01100010 01101001 01101110 01100001 01110010 01111001 00111111";
    let output = a
        .split_whitespace()
        .map(|x| binary_to_ascii(&x))
        .collect::<Vec<_>>();
    println!("{:?}", output.concat());
}

pub fn binary_to_ascii(input: &str) -> String {
    return String::from_utf8(vec![u8::from_str_radix(&input, 2).unwrap()]).unwrap();
}

Output:

Standard Error
   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 1.24s
     Running `target/debug/playground`
Standard Output
"remember learning binary?"

Pretty cool, huh? I never learned binary so...

In any case, now it's time to switch gears and try doing it in the terminal! Befitting a true hacker aesthetic, I decided I'll convert binary into text using only native shell commands - no Python or anything like that.

Since we don't have nice things like from_radix and so on, we'll have to convert our base 2 numbers into text like this:
Binary -> Hexadecimal
Hexadecimal -> Text

So, how do we change bases in the terminal? We can use the built-in command bc (basic calculator) and the corresponding commands obase (output base) and ibase (input base) like this:

me@my-UbuntuBook:~$ bc
bc 1.07.1
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006, 2008, 2012-2017 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'. 
obase=16;ibase=2;01110010
72 # HERE!

Now that we have 72, which maps to a corresponding character's hex code, we can convert it into a character using a reverse hexdump! While tools like od and hexdump can convert characters into hexadecimal codes, only xxd provides a way to reverse it via the -r flag. For example, if we have a file only with 72 inside, and then reverse xxd it:

me@my-UbuntuBook:~$ cat has_seventy_two_inside
72
me@my-MacBookUbuntuBook:~$ xxd -r -p has_seventy_two_inside
r

The -p flag means "plain", and outputs the result without line numbers and all that. For some reason if I don't have that flag, the output is blank so I don't know why? If you have any ideas, drop a comment!

Cool huh? But - we can't get arbitrary input into a running bc, and it's going to be a huge pain to have to type everything in, and then make files to xxd -r on. So let me introduce you to piping!

Piping using the pipe character | lets us move output from one command into another, or have a command take input from a previous one. For example, we could do this:

me@my-UbuntuBook:~$ echo "1+2" | bc
3

Cool! So we can chain all our aforementioned commands together like this:

echo "obase=16; ibase=2; $BYTES_HERE" | bc | xxd -r -p

Elegant, no? And because bash automatically turns strings into iterators split by string, I can skip splitting the string and just go straight to looping:

a="01110010 01100101 01101101 01100101 01101101 01100010 01100101 01110010 00100000 01101100 01100101 01100001 01110010 01101110 01101001 01101110 01100111 00100000 01100010 01101001 01101110 01100001 01110010 01111001 00111111"

for i in $a; 
 do echo "obase=16; ibase=2; $i" | bc | xxd -r -p;
done

(sorry for the bad variable names)

Yay! That took quite a while to solve, but gives a nice satisfactory result.

And finally, everyone's favorite language - JavaScript. Not to brag or anything, but I golfed (one-lined) this problem in 2 minutes:

a="01110010 01100101 01101101 01100101 01101101 01100010 01100101 01110010 00100000 01101100 01100101 01100001 01110010 01101110 01101001 01101110 01100111 00100000 01100010 01101001 01101110 01100001 01110010 01111001 00111111"

a.split(" ").map(x => String.fromCharCode(parseInt(x, 2))).join("")

Easy peezy lemon squeezy.

So how does this work? The .split() method on a string divides the string into an array by chopping it up at each argument passed into split. In this case, I passed a single space so the string of bytes got split up into an array of bytes. Next, just like in the Rust solution, I mapped a function that consumes binary information, converts it into a character code of some sort, and then converts the character code into a human-readable letter. More specifically, parseInt accepts two argument: a string and then a radix (in that order), and converts it into base 10. String.fromCharCode is essentially a reverse decimal dump; it accepts base-10 numbers, and outputs their corresponding character. And finally, because we output an array of letters, to put all the letters back together into a sentence, we use .join on an array with no separator so everything just gets mashed together. And with that, we get the same result.

Hopefully this helped you get a role as a master hacker who can read and decode binary in 2022's Most Awesome Upcoming Hacker Action Movie or at least impress your non-programmer parents, so if you learned something, click all the reactions on the side, and even if you didn't, do it anyway!

Thanks for reading, and see you next time!

Blog

Decoding Binary - 3 Different Ways

CarlyRaeJepsenStan

Join Our Newsletter. No Spam, Only the good stuff.

Related