30 Days of Rust - Day 20
johnnylarner
Posted on May 24, 2023
Yesterday's questions answered
No questions to answer
Today's open questions
No open questions
Working with vectors
Let's quickly recap what vectors are:
Growable arrays of a single type occupying contiguous memory on the heap.
Three aspects of this definition are of interest to us:
- In being growable, vectors are best suited to data whose size is not known at runtime. This also tells us that arrays and tuples should be used for values known before runtime.
- With the single type restriction, how can I represent data that is composed of different types but also ordered in a vectorial fashion? We'll look at an example from the Rust book below!
- As they occupy a contiguous block of memory, making changes to a vector while a reference to its elements is in scope will prevent your code from compiling. This makes sense: growing a vector may cause it to be moved to a new location on the heap, meaning that the reference would be pointing to a deallocated place in memory.
Composing data rich vectors with enums
As we discussed in the previous article, enums in Rust are very flexible and are the solution to the vector's single type restriction. We know that variants of an enum may be of different underlying types. But from the compiler's perspective each variant is of type enum. The Rust book offers a tangible example of where data of different underlying types may need to be stored in an ordered vector format. Enter, row of an excel spreadsheet:
enum SpreadsheetCell {
Int(i32),
Float(f64),
Text(String),
}
let row = vec![
SpreadsheetCell::Int(3),
SpreadsheetCell::Text(String::from("blue")),
SpreadsheetCell::Float(10.12),
];
You might ask, why not simply use a struct with fields that contain either integer, float or text data:
struct SpreadsheetCell {
integer_data: Option<i32>,
float_data: Option<f64>,
text_data: Option<String>,
}
Code designed in this way can lead to issues during struct instantiation: we have to be sure that only one of the three available fields is populated with non-None data. Beyond that, any further interaction with that data will be clunky compared to working with an enum:
// Struct
for struct_cell in &row {
match struct_cell {
Some(cell.integer_data) => println!("Integer cell found"),
Some(cell.float_data) => println!("Float cell found"),
Some(cell.text_data) => println!("Text found"),
_ => println!("Invalid cell found"),
};
}
// Enum
for enum_cell in &row {
match enum_cell {
SpreadsheetCell::Int println!("Integer cell found"),
SpreadsheetCell::Float => println!("Float cell found"),
SpreadsheetCell::Text => println!("Text found"),
};
}
Not only is the first implementation implicit (non-None data is implicitly the type of the struct), it is also error prone:
- We may intend on having a float cell that has both integer data and float data. This will still be classified as an int cell
- We have to catch cases that shouldn't need to exist. With enums, there cannot be a non-int, -float or -text type.
Strings as vectors
Vectors share many traits with the String type. Both are growable, mutable and owned. Moreover, Rust encodes strings according to UTF-8 which means strings are represented under the hood as a vector of bytes.
An advantage of using UTF-8 encoding is that it supports the vast majority of languages and symbols. This wide ranging support comes with a disadvantage, however: slicing strings becomes unreliable.
Non-English language letters and symbols are often encoded using one or more byte. Thus an attempt to access a single character via index slicing will prevent your program from compiling as one byte may not represent a full character. One quick way around this is to use the range slice. However, generally speaking Rust recommends iterating over the characters of a string by calling the .char
method.
Careful who owns your string
Developers very often find themselves wanting to concatenate strings. Most programming languages offer various APIs to achieve this. Rust is no exception. Typically, you'd want to use either the +
operator or format!
macro to concatenate two strings. In Python you'd often prioritise readability when selecting a method of concatenating strings. That applies to Rust as well. But Rust also requires you to consider ownership in your design decision:
The
+
takes ownership of the left hand operand and takes a reference to the right hand one.
format!
macro takes references of all strings.
In theory, +
could help make optimisations to your program's memory usage - the first string is not duplicated. But unless the size of the strings you're concatenating is obnoxiously large, this consideration should probably be considered second after readability.
One final point: Rust has a feature called deref coercion which enables it to treat &String
and &str
as equal types. This means combining string literals and String types is possible.
Hash maps own
I already covered most of the characteristics and operations for hash maps in a previous post. But I'd quickly like to share some learnings on ownership of hash maps:
Any types that implement the Copy trait are copied into a hash map.
Any other types are moved into the hash map.
Posted on May 24, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.