James Robb
Posted on October 24, 2021
Following on from the previous article in this series we will now take a look at the data types rust supports.
First let's define what a data type is:
In computer science and computer programming, a data type or simply type is an attribute of data which tells the compiler or interpreter how the programmer intends to use the data. [...] A data type constrains the values that an expression, such as a variable or a function, might take. This data type defines the operations that can be done on the data, the meaning of the data, and the way values of that type can be stored.
Source: Data Type Wikipedia Page
With this definition we can see that data types provide meaning to our code and allow us to set expectations of what data should come in and flow out of our applications.
There are 2 overarching categories of data types supported by Rust. These categories are scalar types and compound types. We will look at both of these categories in this article and explain what types lay within each category.
Scalar Types
A scalar type can be defined as any type that represents a single value. Rust supports four primary scalar types: integers, floating-point numbers, Booleans, and characters.
Integers
An integer represents a whole number with no fractional component. This means that 100 and -27 are integers but 123.45 and -91.2 are not.
Integer Types
The supported integer types are:
Bit Length | Signed Type | Unsigned Type |
---|---|---|
8-bit | i8 | u8 |
16-bit | i16 | u16 |
32-bit | i32 | u32 |
64-bit | i64 | u64 |
128-bit | i128 | u128 |
arch | isize | usize |
Sidenote 1:
The
isize
andusize
types allow us to let the system decide what size our integers should be. For example if we are on a 32 bit system then anisize
integer would be equivalent to ani32
whereas on a 64 bit system it would be equivalent to ani64
, etc.
Signed integers hold a range from
to
inclusive, where n
is the bit length that variant uses. Unsigned integers on the other hand hold a range from 0 to
.
As an example we can see the range of an i8
and u8
integer below:
Type | Lower Bound | Upper Bound |
---|---|---|
i8 | -128 | 127 |
u8 | 0 | 255 |
It is important to be aware of these ranges because if you enter an integer that is out of range for the given type then Rust will panic as an integer overflow error will occur. The compiler tries to catch as many cases as possible for you but try to always consider which type is most suitable for the use case at hand to avoid any issues that could come up such as when working with third party data for example.
Sidenote 2:
One caveat to understand is that when Rust is compiled in release mode, it doesn't panic with an integer overflow error.
This behaviour is described within the integer overflow documentation. In short though, instead of panicking, in release mode, Rust will opt to allow the integer overflow to occur using two’s complement wrapping.
This simply means that when a value is too high for a specific signed or unsigned integer type, it will roll forwards to the lowest possible value for that type. If however a value is too low for a specific signed or unsigned integer type then it will roll backwards to the highest possible value for that type
Example 1 (
u8
: 0 to 255):256
->0
Example 2 (u8
: 0 to 255):257
->1
Example 3 (i8
: -128 to 127):-129
->127
Example 4 (i8
: -128 to 127):-130
->126
Example 5 (i8
: -128 to 127):130
->-127
Please refer to the integer overflow documentation for more information on this topic.
Sidenote 3:
Rust integers have methods that can check for overflow via the
checked_*
functions, for example: checked_mul.All of the
checked_*
functions will return anOption
where the value could beSome(value)
orNone
. TheNone
type in this case would represent an overflow occurring. This means that we can be sure in advance that if an overflow occurs, we have handled it properly!Another variation on handling overflows can be seen in the
saturating_*
functions such as saturating_mul for example.All of the
saturating_*
functions will just stop at the upper or lower limit instead of returning anOption
.
Example usage of each integer type:
// Signed integers
let signed_8: i8 = -1;
let signed_16: i16 = 2;
let signed_32: i32 = -3;
let signed_64: i64 = 4;
let signed_128: i128 = -5;
let signed_size: isize = -5;
// Unsigned integers
let unsigned_8: u8 = 1;
let unsigned_16: u16 = 2;
let unsigned_32: u32 = 3;
let unsigned_64: u64 = 4;
let unsigned_128: u128 = 5;
let unsigned_size: usize = 5;
Here we have declared a list of signed and unsigned integers of each possible type variation. The type is declared by using the : <type>
declaration after each variable name but when declaring variables we don't always have to manually add the type because rust will automatically assume an integer to be of type i32
unless otherwise stated.
Sidenote 4:
A signed integer is an integer that can be negative or positive whereas an unsigned integer can only ever be positive.
Sidenote 5:
Signed integers are stored using the two’s complement method.
Integer Literals
Integer literals are a way of writing integers in different notations, rust supports 5 notations out of the box:
Number Literal Type | Example | Decimal Equivelant |
---|---|---|
Binary | 0b1111_0000 | 240 |
Byte (u8 only) | b'A' | 65 |
Decimal | 253 | 253 |
Hexadecimal | 0xff | 255 |
Octal | 0o77 | 63 |
Sidenote 6:
All integer literals except the byte literal allow a type suffix to be used, such as
-29i8
to cast -29 to ani8
or254u8
to cast 254 as au8
.
Integer literals also support _
as a visual separator, such as 2_53
for 253 or 1_021_000
for 1,021,000. This separator can be of any length meaning that, for example, 1_________0
is perfectly acceptable, to the compiler at least, as a representation for 10.
Floating Point Numbers
A floating point number or float for short is any number with a fractional component. For example 12.75 and 3.1 are floats but 10 and 287 are not.
There are two kinds of floating point numbers supported in rust:
Bit Length | Type |
---|---|
32-bit | f32 |
64-bit | f64 |
The default type used by Rust, if no type decorator is added, is f64
because on a modern CPU it’s roughly the same speed as an f32
but is capable of far more precision.
Example usage of each floating point type:
let float_32: f32 = 1.1;
let float_64: f64 = 3.5;
let another_float_64 = 3.5;
Floating-point numbers are represented according to the IEEE-754 standard by Rust. The f32
type is a single-precision float, and f64
is a double-precision float.
Sidenote 7:
Floating point numbers support can use
_
as a visual separator just like integers, for example:1_234.56
.
Booleans
Booleans represent either a true
or false
value and take up exactly 1 byte of memory due to true
being represented as a 1
and false as a 0
internally.
We can see below how we can assign booleans with or without a type prefix:
let boolean_true = true;
let boolean_false: bool = false;
Boolean data allows us to test if a statement is true
or false
and booleans are generally used within a control flow.
Most of the time you won't manually write true
or false
either but instead use logical operators against values to test their truthiness, for example:
let number = 3;
if number == 3 {
println!("condition was true");
} else {
println!("condition was false");
}
In this example we state that if the statement 3 is equal to 3 is true
then we should print out "condition was true"
and if it is false
then we should print out "condition was false"
. Of course in this case it is true
and so "condition was true"
will be printed.
Characters
A character or char
as it is known in Rust represents a single unicode scalar value and takes up four bytes when allocated to memory.
Sidenote 8:
I don't want to get too deep into what unicode is as that is not really relevant to this article but if you are interested, you can look at the characters unicode supports for yourself.
To create a char
we use single quotes around the character we wish to represent. For example:
let a = 'a';
let one = '1';
let rabbit = '🐇';
let warning = '⚠️';
let japanese_katakana_n = 'ン';
As you can see, each char
is a single character representing a unicode compliant value such as a letter, number, emoji or characters of other non-latin languages such as Japanese as shown in the above example thanks to Unicode having these characters supported in the standard too!
Sidenote 9:
Here you will find a list of all languages supported by unicode.
Compound Types
Now that we have looked into scalar types we can move onto what is known as compound types. A compound type can group multiple values into a single representation and rust has two primitive compound types: tuples and arrays.
Tuples
A tuple allows us to group together values of different types and once declared cannot grow or shrink because tuples have a fixed length once defined.
A tuple is represented by rounded brackets containing values, for example: ('a', 1, 2.4)
.
We can also manually add a type definition if we want to, like so:
let tuple: (char, u8, f32) = ('a', 1, 2.4);
We don't need to add the manual type annotation unless we wish to be more precise about the types of our values than Rusts inferred type system is.
Tuples also support a nice feature known in most languages as destructuring, for example:
let user = ("James", 27);
let (name, age) = user;
println!("{} is {} years old", name, age);
This is nice because it allows us to provide a name to each item in our tuple instead of directly accessing the value.
If we want to directly access values though, tuples are zero indexed data structures and so we can access data like so:
let user = ("James", 27);
println!("{} is {} years old", user.0, user.1);
This works exactly the same as before by taking the first and second values in the tuple and outputting them but personally I would always use the destructured version as it is more descriptive as to what the data represents.
Arrays
Unlike tuples, arrays must contain values of the same type but are otherwise quite similar in that they also have a fixed length and cannot grow or shrink once defined.
An array uses square brackets and elements within are seperated by a comma, for example: ['a', 'b', 'c']
.
Arrays are great when you want to guarantee a uniform data type within the collection or if you don't want the collection to change size over time.
Sidenote 10:
A lot of developers avoid arrays when beginning with Rust because in Rust, unlike JavaScript for example, arrays are inflexible and so they reach for a vector instead.
Vectors are great for many use cases but should not be used for all circumstances because we want to be as strict as possible with our types.
If there is a case where limiting the amount of values in the collection or guaranteeing a uniform data type for members of the collection is important, just use an array!
One example use case shown to us in the Rust docs for using an array would be for storing a list of months:
let months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"];
Even though Rust will infer the types for us, we can declare the type of data our array should contain and how many elements the array should hold too. Expanding the last example we could write:
let months: [&str; 12] = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"];
This explicitly says that our list contains elements of type &str
and there should be 12
of them altogether.
Sidenote 11:
We stated above that the variable
months
is of type[&str; 12]
but one thing worth noting is that if the value type&str
or the given length12
or both change, the entire type itself changes as far as the compiler is concerned. For example:If variable
A
is of type[i32; 3]
and variableB
is of type[i32; 4]
, these are totally different types according to the compiler, even though both represent an array ofi32
values.
Another cool trick that arrays can do is repeating a value a set amount of times succinctly. The following code will generate us an array with 3
elements, each holding the char
value 'a'
:
let months = ['a'; 3]; // -> ['a', 'a', 'a']
We can also index arrays just like we could with tuples but instead of using .
notation we use square bracket notation:
let alphabet = ['a', 'b', 'c'];
let a = alphabet[0];
let b = alphabet[1];
let c = alphabet[2];
We can see here that arrays, just like tuples, are zero indexed data structures. That is to say, the first element is at index 0, the second element is at index 1, and so on.
Sidenote 12:
If you try to access an array index which does not exist the program will panic at runtime, not compile time, and stop your application from executing any further. Thus, be careful when trying to access values by their index by adding a check to make sure that the index you want to use is within range!
The Rust array data type documentation describes the reasons behind this behaviour in more detail.
We will discuss more about Rust guarantees, errors, compilation and more in the future as we progress in this series.
We can now see that arrays are a useful data structure for our toolbelt when developing applications with Rust and serve a niche where a collections value type uniformity and length are an important aspect for the items we wish to store in memory.
Conclusions
Rust gives us a lot of types out of the box, each bringing their own use cases and value to the table.
As we continue through this series we will eventually start working with custom types and sub types such as &str
and collections such as String
, Vector
and HashMap
but until then this should give you a good overview of the initial building blocks that these types and collections inevitably themselves end up using.
In the next article we will look at functions in Rust: What they are, how we work with them, when they are useful to use, etc.
I look forward to seeing you then and as ever, feedback and questions are always welcome 😊!
Posted on October 24, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.