Learning Rust šŸ¦€: 17 - Rust Collections: Strings - How complex can it be?!

fadygrab

Fady GA šŸ˜Ž

Posted on October 13, 2023

Learning Rust šŸ¦€: 17 - Rust Collections: Strings - How complex can it be?!

The second common Rust collection we will visit is the String! Rust has taken a ... "different" approach when working with the String type than other programming languages but seriously, how complex can it be?! Let's find out!
hard

āš ļø Remember!

You can find all the code snippets for this series in its accompanying repo

If you don't want to install Rust locally, you can play with all the code of this series in the official Rust Playground that can be found on its official page.

āš ļøāš ļø The articles in this series are loosely following the contents of "The Rust Programming Language, 2nd Edition" by Steve Klabnik and Carol Nichols in a way that reflects my understanding from a Python developer's perspective.

ā­ I try to publish a new article every week (maybe more if the Rust gods šŸ™Œ are generous šŸ˜) so stay tuned šŸ˜‰. I'll be posting "new articles updates" on my LinkedIn and Twitter.

Table of Contents:

Why the String type is considered a collection?

That's easy, because in Rust is a "collection" of bytes šŸ˜!
Actually, the String type in Rust is a wrapper of a u8 Vector (Vec<u8>) as it contains all the methods available for the Vector type plus other methods that makes working with strings easier. It holds the bytes of a UTF-8 encoded string which might become problematic as we will see later.

You may recall that String type is located in the Heap and it's of variable length while string slice str (usually used as a reference &str) is located in the stack.

Creating a new String:

We can create new String as follows:

let my_string: String = String::new();
Enter fullscreen mode Exit fullscreen mode

This will create a new and empty "immutable" String in my_string.

Sometimes, we would want to create a String from a seed text instead of an empty one. To do that, we can either use the from method of the String type or use the convenient to_string method that is avaible to some of Rust's core types.

let my_string: String = String::from("Hello");
println!("from: {my_string}");
let word: &str = "Hello";
let my_string: String = word.to_string();
println!("to_string: {my_string}");
Enter fullscreen mode Exit fullscreen mode

The use of from or to_string is just a matter of style and readability!

Updating a String:

There are more than one way to update a "mutable" String, we can use the push_str method that the String type implements to append the new string passed to it to the original String. As the String type is a wrapper for the Vector type, it to has a push method that - similar to push_str - pushes a Char type to the original String. Let's see both of those in action:

let mut new_str = String::from("Hello, ");
new_str.push_str("There");
println!("Greeting: {new_str}");
new_str.push('!');
println!("New greeting: {new_str}");
Enter fullscreen mode Exit fullscreen mode

This will output:

Greeting: Hello, There
New greeting: Hello, There!
Enter fullscreen mode Exit fullscreen mode

This code does what you expect it to do. We've used push_str to append "There" to the original String ("Hello, "). Next, we've used push to append "!" at the end of the original String now containing "Hello, There".

Notice how the push takes a Char type since the "!" is passed to it with single quotes denoting the Char type.

Another way to update a string is the "+" operator, let's check it out:

let s1 = String::from("Hello,");
let s2 = String::from("There!");
let s3 = s1 + &s2;
Enter fullscreen mode Exit fullscreen mode

In short, after executing this code s3 will hold "Hello, There!" but there is a lot that is going on here!
You may notice that s1 is "moved" into s3 i.e. we can't use s1 anymore in our app after the s3 line and we are "borrowing" s2. This is not a random thing; this has to do with the "+" operator as it uses the add method with has the signature:

fn add(self, s: &str) -> String
Enter fullscreen mode Exit fullscreen mode

As you can see, the add method takes ownership of self (the s1 in our example) and takes s as a string literal which is supposed to represent s2.

You may wonder how this code works as s2 is a String not a &str in our example. At this point, let's just say that the Rust compiler can coerce (convert) the &String type into &str and we will revisit that later.

Now, consider the example below:

let s1 = String::from("Hello");
let s2 = String::from("There");
let s3 = String::from("!");
let s4 = s1 + ", " + &s2 + &s3;
println!("s4 is {s4}");
Enter fullscreen mode Exit fullscreen mode

This will output:

Hello, There!
Enter fullscreen mode Exit fullscreen mode

But as you can see, it may become unnecessarily complicated to use the "+" operator when try to modify a String with a relatively big number of string literals. This is a perfect use case for yet another new Rust macro, format!. Now, let's see how we can use it in the previous example:

let s1 = String::from("Hello");
let s2 = String::from("There");
let s3 = String::from("!");
let s4 = format!("{s1}, {s2}{s3}");
println!("s4 is {s4}");
Enter fullscreen mode Exit fullscreen mode

This will produce the exact same result as before but notice that s1 doesn't "move" into s4 which can be convenient if we were planning to use it later in the app.

Indexing into Strings:

Up until now, everything more or less looks the same as other programming languages but here is where the Rust fun starts
fun
In Python, doing something like this is perfectly fine:

s = "Hello"
H = s[0]
Enter fullscreen mode Exit fullscreen mode

Now given that the String type is in fact a wrap for a Vector and the later can have its elements accessed by using zero-based indices, so the following should work, right?

let my_string = String::from("Greetings šŸ˜‰ !");
let G = my_string[0]; 
Enter fullscreen mode Exit fullscreen mode

Wrong! This will produce the following error:

the type `String` cannot be indexed by `{integer}`
Enter fullscreen mode Exit fullscreen mode

For reasons that will become clear in just seconds, Rust only permits you to "slice" the String type! So, something like the following code will compile and work:

let my_string = String::from("Greetings šŸ˜‰ !");
let Gree = &my_string[..4];
println!("{Gree}");
Enter fullscreen mode Exit fullscreen mode

This will work and will print Gree in the terminal.
Now, let's try this again and this time we will try to extract the "wink" emoji that is at position 10 with the preceding and trailing spaces:

let my_string = String::from("Greetings šŸ˜‰ !");
let wink = &my_string[9..12];
println!("{wink}");
Enter fullscreen mode Exit fullscreen mode

Here, we are starting our slice at the first space at position 9 then end it at the trailing space at position 11 (the end isn't inclusive here). The app compiles but it will panic!

The reason behind that is the String stores - like I've mentioned - UTF-8 encoded "bytes" and note the visual character that we see! It turned out that "šŸ˜‰" is four bytes long, and although we've correctly set the slice's start at position 9 as all the character up to this point are using only one byte, the slice's end happens to be the second byte of the "šŸ˜‰" which isn't a printable character and that why the app panicked! Although, String slicing is perfectly legal in Rust, it can be dangerous at runtime!

Now, why doesn't Rust allow indexing?
It's because of the UTF-8 encoding thing to avoid cases when you could return unprintable characters and introducing bugs that might not be discoverable immediately, Rust doesn't compile that code!

So why use slicing as it also can return unprintable characters?
Slicing is Rust's way to tell you to be more specific to return a range of bytes if you know what you are doing!

The String is not simple!

String in Rust isn't that complex but isn't that simple either!
Generally, you can represent a String by either its "bytes" scalars or its "characters" (char type). For example, the string used in the previous example "Greetings šŸ˜‰ !" has the following scalar bytes representation:

71
114
101
101
116
105
110
103
115
32
240
159
152
137
32
33
Enter fullscreen mode Exit fullscreen mode

And the following "characters" representations:

G
r
e
e
t
i
n
g
s

šŸ˜‰

!
Enter fullscreen mode Exit fullscreen mode

Notice how there are 13 characters and 16 bytes representing the same string!

This distinction in strings representation can be abstracted away in some programming languages such as Python and the developer doesn't think much of how the strings are represented (unless he needs to, say when sending strings over the network). But in Rust, it by design forces the developers to think about how strings are represented, namely the UTF-8 encoding, to avoid bugs early on in the development.

Iterating over a String:

Finally, the String type has some useful methods to iterate over its elements.
For example, if we want to iterate over the string's characters representation, we can use the following:

for c in my_string.chars() {
    println!("{c}")
}
Enter fullscreen mode Exit fullscreen mode

Similarly, if we want to iterate over its bytes, we use:

for b in my_string.as_bytes() {
    println!("{b}")
}
Enter fullscreen mode Exit fullscreen mode

At the end, Rust's String isn't that complex! It's a collection of the UTF-8 encoded bytes of a string and Rust wants you to think of that from the start. In the next one we will explore another common Rust collection, the Hash map. See you then šŸ‘‹

šŸ’– šŸ’Ŗ šŸ™… šŸš©
fadygrab
Fady GA šŸ˜Ž

Posted on October 13, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related