Over to Rust - Gotchas for Beginners

taikedz

Tai Kedzierski

Posted on September 10, 2023

Over to Rust - Gotchas for Beginners

Cover image (C) Tai Kedzierski

I had an idea for a simple utility the other day - a way to manage a *nix $PATH. At it's core, it has one file reading operation, which also does some line filtering, and one file-append operation. That's all.

I wrote it up quickly in python and threw it into a Github gist.

I then realised, hey, this program is pretty simple. I wonder how easy it would be to write it in rust? And so the journey began...

Rust-lust

What is Rust? Articles abound on the Net and this site. Suffice to say it is a new language that has attracted the envy of many - StackOverflow blog posted some thoughts in 2020, and in the 2023 developer survey, over 80% of respondents who have used it in the past year want to keep on using it.

I've been wanting to properly sit down and learn rust for a while now - at least a couple of years. I mainly work with python, which is a superb language for getting things done fast - but that low barrier to entry encourages one undesirable thing: messy, buggy code. Its surge in popularity has been attributed plenty of times to it being easy to learn, the dynamic-typing and very forgiving runtime allow you to see results in minutes, sometimes even seconds, but at the cost of letting you get away with code-murder.

I really want to learn rust not only for its speed and memory safety (I am perennially eyeing up the goal of writing a multiplayer game server), but also to benefit from thinking about code fundamentally differently, and open up new practices to apply further afield.

The learning curve is proving to be not so gentle of course, but its advertised advantages feel worth it. But as all things in learning, you need some practical exercising to properly internalise the lesson.

Some resource I had been following suggested first reading through The Book without typing any code, and then re-visit the book again after ingesting the basics with a more holistic idea. I had already done that first part (I couldn't help myself, I was experimenting as I went along), and was now itching for an actual project. The Book provides some useful mini-projects to try out by way of a practical tutorial, but what I really wanted was something I would find useful myself. pathctl was just what I needed. There were a few very simple requirements, and I had proven its simplicity by doing it in a language I know.

After an initial fortnight of dipping into The Book, it has taken me one week (first python on Monday, basic completion of rust project on Sunday).

Here's some notes from my mini-journey so far - things that The Book didn't cover quite so obviously, or where I was just thrown as a python/dynamic/other-language programmer.

Ownership and the Borrow Checker

I was exposed to some C programming many years ago at university, and I remember distinctly disliking having to juggle heap pointers and manage memory de/allocations. They're not a difficult concept to grasp at all, but managing the wee blighters is a chore - testament to which the numerous dereference bugs, and the very existence of rust.

Rust's Ownership concept and the compile-time Borrow Checker feel jarring to work with at first, especially coming from a dynamic language, but its novelty fades away very quickly into a level of semi-comfort.

Operations on Strings usually return &str slices , which are references to the original data - owned. So if you have a vector (list for the pythonistas out there) of items, and you want to return a vector of modified strings, you need to explicitly convert these to new objects. This might seem memory-inefficient at first, but that's in fact what pretty mcuh any language will do under the hood. It's just explicit in rust:

fn load_valid_lines(path_str:&str) -> Vec<String> {
    // The actual file opening/reading etc is done elsewhere
    read_lines(path_str)
        .iter() // Explicitly iterate

        // trim() returns a &str , referencing the iterated line
        // which cannot be returned (owned and remaining in the function)
        // so we produce a new String using a one-line closure
        .map(|line| String::from(line.trim()))

        // the `is_valid_line()` function checks the item without taking ownership: pass in `&line`
        .filter(|line| is_valid_line(&line))

        // and finally create a vector, from the map .
        // Note the "::" and the doubled '<' for generic-in-generic
        .collect::<Vec<String>>()
}
Enter fullscreen mode Exit fullscreen mode

Enums

In the two languages I had previously looked at for enums, python and Java, their enum types were both cemented collections of values.

Rust does it a bit differently - additionally to being a list of fxied values, enums can also encode for a simple type family using value tuples.

enum IPAddress {
    IPv4(u8, u8, u8, u8),
    IPv6(String)
}

fn main() {
    let local = IPAddress::IPv4(127,0,0,1);
}
Enter fullscreen mode Exit fullscreen mode

This also means that Enums can have shared implementation functions ... and perhaps some more.

I gave up on using Enums for my argument action logic, because I got myself into a bit of a tizzy in trying to use them.

One use-case of Enums however ties in with the match mechanism, native to the language, which allows thinking of returned values as being of particular distinct eventualites:

  • Result Enum expresses the idea that an operation could have succeeded or failed, and forces the caller to handle both
  • Options Enum expresses the idea that an operation could have produced Some(T) result, or None (an eventuality if you will, but not a value)

There is no null only nullity.

A special Enum Option allows handling the cases of None or Some(T) - there is no null or None type or value that can be passed around, but instead the Option enumeration that allows expressing that eventuality, and handling it over the match mechanism.

Instead of being able to say "I've got nothing", with ambiguity over whether NothingTM might be actually a new brand of egregiously air-filled popped snacks; rust forces you to conceive that "I didn't get any valid value from the operation."

It's a very subtle difference, but it does introduce a deliberate idiomatic way of thinking about the concept.

Error handling

Error handling is also done frequently via the match item on operations that return Enums. A common Enum is Result which can either be Err or Ok.

You usually need to handle it with your result to get the handler you want:

match FileOpener::open(file_name) {
  Err(e) => { eprintln!("Failed: {}", e); },

  Ok(file_handler) => {
    println!("I'm in!");
  }
}
Enter fullscreen mode Exit fullscreen mode

Oftentimes, code snippets will use unwrap() on their snippets for the sake of brevity, but that removes control from you the programmer to simply panicking the process, and exposing programming guts at runtime:

// Panics altogether if something goes wrong
let file_handler = FileOpener::open(file_name).unwrap();
println!("I'm in!");
Enter fullscreen mode Exit fullscreen mode

So whenever there's an unwrap() in the chain, it's better to split it out into a match and deal with it.

-    read_to_string(path_str) 
-        .unwrap()           // panic on possible file-reading errors
-        .lines()            // split the string into an iterator of string slices
-        .map(String::from)  // make each slice into a string
-        .collect()          // gather them together into a vector
+    match read_to_string(path_str) {
+        Err(e) => {
+            eprintln!("Error reading file '{}': {}", path_str, e);
+            std::process::exit(1);
+        }
+        Ok(data) => {
+            data.lines()            // split the string into an iterator of string slices
+                .map(String::from)  // make each slice into an owned string
+                .collect()          // gather them together into a vector
+        }
+    }

Enter fullscreen mode Exit fullscreen mode

"Catching" Errors

On that note - there are two kinds of errors to consider:

  • the kind you predict can happen and have coded around
  • the kind that you really shouldn't have let happen (panic)

(there are of course the kind you couldn't conceive of ahead of time, but we'll let that one slide)

I've seen it posited that any error that you can identify in Result type or by simply returning a given value can and probably should be handled. If you don't handle them (naughty use of unwrap()?) and allow the program to panic as a result, it's on you. Handle them there and then, and if throwing them up a level, do so using idiomatic structures like the Result and Option Enums.

Conversely, if a program panics don't seek to catch that. Relying on the program panicking as a way of doing flow control is essentially viewed as sloppy - something you didn't care to cleanly pass values for, showing that perhaps you didn't quite think of that.

(Of course, if a library deep-down chooses to panic rather than return a Result::Err, it's up for debate. I'm not yet conversant enough in the ecosystem to know what should be thought of with such an eventuality)

The use of try/catch in many languages as a control flow mechanism is seen as encouraging the programmer to see an exceptional circumstance as normal possibility. It might be semantics, it might simply just be idiomatic.

But rust very pointedly encourages the programmer handle cases properly, and see a panic as a symptom of something arising from improper design.

EDIT - I've been around a few more doc pages and modules, and found that my above understanding needs to be more nuanced... certainly using .unwrap_* alternative functions can offer some level of acceptable blissful ignorance.

In The Book itself, there is a section dedicated to understanding when panicking should be considered suitable.

It can be seen that ultimately, if there is really no sense in keeping on, it might be sensible enough to just panic.

My own take however is that if you choose to panic, the only result is the aborting of the program - and often that can lead to unintended consequences. Consider:

fn collate_files(in_files: &Vec<&str>, connection: ConnectionType) {
    // Create a temp file with each 'X' replaced by an arbitrary letter
    let output_file = tempfiles::new(".temp-XXXX");
    for input_file in in_files.iter() {
        process_and_write(input_file, &output_file); // panics?
    }
    connection.send_file(output_file);
    output_file.remove();
}
Enter fullscreen mode Exit fullscreen mode

If you write process_and_write such that it panics if the input file cannot be found (for example), not only do you skip writing the result back over the network, but you also prevent any cleanup actions.

Especially, if you write a process_and_write function as a library utility for others to consume, you prevent others from performing their own cleanups as needed. So as best possible, never panic when you are writing library. By the same token, don't depend on library functions that panic when providing your own library.

Keep calm and return Result(T,E)

Modules and submodules

Rust distinguishes (though I found the documentation a little light here) modules and submodules, and the programmer needs to declare them properly.

Say I have a module util.rs , sidecar to my main.rs. Say I want my util module to have two submodules, slice.rs and dice.rs.

In my main script, I can call

mod util;
Enter fullscreen mode Exit fullscreen mode

And that tells the compiler we expect a module util.rs to be present (there's a legacy variation for this, but this is a simpler way for me to brain this). Inside that file, I must use

mod slice;
mod dice;
Enter fullscreen mode Exit fullscreen mode

This is not (only) for exposing those submodules to my main file. They actually tell the compiler that those submodules are expected to exist.

Even if my main file only will use slice , if I want to access some functions from dice , the latter must be declared in the supermodule.

Binary sizes

A normally rustc compiled/built rust program is bundled with a load of debug symbols by default, and hello_world typically clocks in a 12 MB . This is huge, given that this is supposed to be a systems language, one of the applications for which is supporting embedded systems.

Running rustc -C strip=symbols mycode.rs removes these, paring the binary down to a much more reasonable 331 KB.

In the Cargo.toml file, this can be cemented as

[profile.dev]
strip = "symbols"

[profile.release]
strip = "symbols"
Enter fullscreen mode Exit fullscreen mode

The systematic presence of debug symbols seems to surprise most people - I presume that in other languages, the default behaviour is to not do this, and include them when the compiler is specifically requested to do so.

I suspect this novel default behaviour is what allows new rust programmers to have debug information available out of the box. Removing the debug symbols amounting to explicitly "removing the training wheels", where as in the opposite case, training wheels need to be found. I dunno, just a hunch 🀷

Is it easy ?

I often hear about how "Spanish is easy to learn, Polish (etc) is difficult," and of course, that really depends on where you're starting from. If you know English, it is similar enough to Spanish that it's easy to pick up, relative to the case-declensions and non-latinate roots Polish. But to a native Czech, the complete opposite is likely to be true.

So how is rust's learning curve?

I can spot Python code written by a C programmer a mile off. Learning a new language in its idiomatic form will always be a challenge. Don't just try to "write like (my language X) in Rust." You won't get the benefits. With that in mind:

For someone used to JavaScript or Python as their main, rust stands to be quite involved. If you started your career purely mucking in with dynamically-typed, object-oriented, exception-catching languages without doing any lower-level programming introductions, it'll be quite an alien world.

Strict typing, explicit heap references, strict ownership and a fussy compiler all combine to make it feel like progress in learning is slow.

In my case, I did come to rust with some exposure, an era and an age ago, to the concepts of pointers and stack/heap memory, and other odds and ends, so I've managed to dust off the old knowledge to just about follow, but I'm not completely in my element either.

For a C programmer, there are idiomatic challenges around the match/Enum techniques and perhaps others, and there are types. You can't just declare everything as a pointer and call it a day.

I expect in the end the ones who will have it easiest are those who work both with C and with a dynamic language on a frequent basis, going to and from each other, and comfortable with the idiosyncracies and idiomatics of each.

Thankfully, it's not Haskell-levels of maddening.

πŸ’– πŸ’ͺ πŸ™… 🚩
taikedz
Tai Kedzierski

Posted on September 10, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related