Learning Rust š¦: 17 - Rust Collections: Strings - How complex can it be?!
Fady GA š
Posted on October 13, 2023
The second common Rust collection we will visit is the String! Rust has taken a ... "different" approach when working with the String type than other programming languages but seriously, how complex can it be?! Let's find out!
ā ļø Remember!
You can find all the code snippets for this series in its accompanying repo
If you don't want to install Rust locally, you can play with all the code of this series in the official Rust Playground that can be found on its official page.ā ļøā ļø The articles in this series are loosely following the contents of "The Rust Programming Language, 2nd Edition" by Steve Klabnik and Carol Nichols in a way that reflects my understanding from a Python developer's perspective.
ā I try to publish a new article every week (maybe more if the Rust gods š are generous š) so stay tuned š. I'll be posting "new articles updates" on my LinkedIn and Twitter.
Table of Contents:
- Why the String type is considered a collection?
- Creating a new String
- Updating a String
- Indexing into Strings
- The String is not simple!
- Iterating over a String
Why the String type is considered a collection?
That's easy, because in Rust is a "collection" of bytes š!
Actually, the String type in Rust is a wrapper of a u8
Vector (Vec<u8>
) as it contains all the methods available for the Vector type plus other methods that makes working with strings easier. It holds the bytes of a UTF-8 encoded string which might become problematic as we will see later.
You may recall that String type is located in the Heap and it's of variable length while string slice
str
(usually used as a reference&str
) is located in the stack.
Creating a new String:
We can create new String as follows:
let my_string: String = String::new();
This will create a new and empty "immutable" String in my_string
.
Sometimes, we would want to create a String from a seed text instead of an empty one. To do that, we can either use the from
method of the String type or use the convenient to_string
method that is avaible to some of Rust's core types.
let my_string: String = String::from("Hello");
println!("from: {my_string}");
let word: &str = "Hello";
let my_string: String = word.to_string();
println!("to_string: {my_string}");
The use of from
or to_string
is just a matter of style and readability!
Updating a String:
There are more than one way to update a "mutable" String, we can use the push_str
method that the String type implements to append the new string passed to it to the original String. As the String type is a wrapper for the Vector type, it to has a push
method that - similar to push_str
- pushes a Char type to the original String. Let's see both of those in action:
let mut new_str = String::from("Hello, ");
new_str.push_str("There");
println!("Greeting: {new_str}");
new_str.push('!');
println!("New greeting: {new_str}");
This will output:
Greeting: Hello, There
New greeting: Hello, There!
This code does what you expect it to do. We've used push_str
to append "There" to the original String ("Hello, "). Next, we've used push
to append "!" at the end of the original String now containing "Hello, There".
Notice how the
push
takes a Char type since the "!" is passed to it with single quotes denoting the Char type.
Another way to update a string is the "+" operator, let's check it out:
let s1 = String::from("Hello,");
let s2 = String::from("There!");
let s3 = s1 + &s2;
In short, after executing this code s3
will hold "Hello, There!" but there is a lot that is going on here!
You may notice that s1
is "moved" into s3
i.e. we can't use s1
anymore in our app after the s3
line and we are "borrowing" s2
. This is not a random thing; this has to do with the "+" operator as it uses the add
method with has the signature:
fn add(self, s: &str) -> String
As you can see, the add
method takes ownership of self
(the s1
in our example) and takes s
as a string literal which is supposed to represent s2
.
You may wonder how this code works as
s2
is a String not a &str in our example. At this point, let's just say that the Rust compiler can coerce (convert) the &String type into &str and we will revisit that later.
Now, consider the example below:
let s1 = String::from("Hello");
let s2 = String::from("There");
let s3 = String::from("!");
let s4 = s1 + ", " + &s2 + &s3;
println!("s4 is {s4}");
This will output:
Hello, There!
But as you can see, it may become unnecessarily complicated to use the "+" operator when try to modify a String with a relatively big number of string literals. This is a perfect use case for yet another new Rust macro, format!
. Now, let's see how we can use it in the previous example:
let s1 = String::from("Hello");
let s2 = String::from("There");
let s3 = String::from("!");
let s4 = format!("{s1}, {s2}{s3}");
println!("s4 is {s4}");
This will produce the exact same result as before but notice that s1
doesn't "move" into s4
which can be convenient if we were planning to use it later in the app.
Indexing into Strings:
Up until now, everything more or less looks the same as other programming languages but here is where the Rust fun starts
In Python, doing something like this is perfectly fine:
s = "Hello"
H = s[0]
Now given that the String type is in fact a wrap for a Vector and the later can have its elements accessed by using zero-based indices, so the following should work, right?
let my_string = String::from("Greetings š !");
let G = my_string[0];
Wrong! This will produce the following error:
the type `String` cannot be indexed by `{integer}`
For reasons that will become clear in just seconds, Rust only permits you to "slice" the String type! So, something like the following code will compile and work:
let my_string = String::from("Greetings š !");
let Gree = &my_string[..4];
println!("{Gree}");
This will work and will print Gree
in the terminal.
Now, let's try this again and this time we will try to extract the "wink" emoji that is at position 10 with the preceding and trailing spaces:
let my_string = String::from("Greetings š !");
let wink = &my_string[9..12];
println!("{wink}");
Here, we are starting our slice at the first space at position 9 then end it at the trailing space at position 11 (the end isn't inclusive here). The app compiles but it will panic!
The reason behind that is the String stores - like I've mentioned - UTF-8 encoded "bytes" and note the visual character that we see! It turned out that "š" is four bytes long, and although we've correctly set the slice's start at position 9 as all the character up to this point are using only one byte, the slice's end happens to be the second byte of the "š" which isn't a printable character and that why the app panicked! Although, String slicing is perfectly legal in Rust, it can be dangerous at runtime!
Now, why doesn't Rust allow indexing?
It's because of the UTF-8 encoding thing to avoid cases when you could return unprintable characters and introducing bugs that might not be discoverable immediately, Rust doesn't compile that code!
So why use slicing as it also can return unprintable characters?
Slicing is Rust's way to tell you to be more specific to return a range of bytes if you know what you are doing!
The String is not simple!
String in Rust isn't that complex but isn't that simple either!
Generally, you can represent a String by either its "bytes" scalars or its "characters" (char
type). For example, the string used in the previous example "Greetings š !" has the following scalar bytes representation:
71
114
101
101
116
105
110
103
115
32
240
159
152
137
32
33
And the following "characters" representations:
G
r
e
e
t
i
n
g
s
š
!
Notice how there are 13 characters and 16 bytes representing the same string!
This distinction in strings representation can be abstracted away in some programming languages such as Python and the developer doesn't think much of how the strings are represented (unless he needs to, say when sending strings over the network). But in Rust, it by design forces the developers to think about how strings are represented, namely the UTF-8 encoding, to avoid bugs early on in the development.
Iterating over a String:
Finally, the String type has some useful methods to iterate over its elements.
For example, if we want to iterate over the string's characters representation, we can use the following:
for c in my_string.chars() {
println!("{c}")
}
Similarly, if we want to iterate over its bytes, we use:
for b in my_string.as_bytes() {
println!("{b}")
}
At the end, Rust's String isn't that complex! It's a collection of the UTF-8 encoded bytes of a string and Rust wants you to think of that from the start. In the next one we will explore another common Rust collection, the Hash map. See you then š
Posted on October 13, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
October 20, 2023
October 13, 2023
October 1, 2023