Rust #6: Exploring crates
Matt Davies
Posted on July 24, 2021
I often install tools via cargo and use crates for my code that have many dependencies. If you're like me, you are wondering when downloading and compiling what all those crates do. I could just look at the top N popular crates on http://crates.io but I thought that was boring. Rather, I thought I'd clone the exa
command-line tool from Github and see what crates it used. Better to look at a released tool that is out in the wild?
The 'exa' tool is a Rust-built drop-in replacement for the Unix command ls
. It allows me to produce listings like:
A simple command shows us the crate dependency hierarchy for this tool:
$ cargo tree
The top-level list of dependencies are:
ansi_term
datetime
git2
glob
lazy_static
libc
locale
log
natord
num_cpus
number_prefix
scoped_threadpool
term_grid
term_size
unicode-width
users
zoneinfo_compiled
Some interesting crates that I recognise and are dependencies of the above also appeared:
bitflags
byteorder
matches
pad
tinyvec
url
So now I will boot up my favourite browser, go to http://docs.rs and figure out what all these crates do and see if any are useful. At least cursory knowledge of them will stop me from reinventing the wheel if I need their functionality.
ansi_term
This is a library that allows the generation of ANSI control codes that allow for colour and formatting (i.e. bold, italic, etc.) on your terminal. If you are not sure what ANSI control codes are, this link here at Wikipedia should explain it. It provides styles and colours via the builder pattern or a colour enum. For example:
use ansi_term::Colour::Yellow;
println!("My name is {}", Yellow.bold().paint("Matt"));
There's support for blink, bold, italic, underline, inverse (confusingly call reverse here), 256 colours and 24-bit colours. This is a very useful crate for wrapping strings with the ANSI control codes that you need.
But I have seen crates that do this with different syntax using extension traits on &str
and String
. The one that I use frequently is colored
. For example, to recreate the last snippet using colored
, it is:
use colored::Colorize;
println!("My name is {}", "Matt".yellow().bold());
I prefer the latter form, but it is totally subjective.
datetime
This is one crate I am familiar with. Unfortunately, I found the standard's library time and date support lacking for a tool I was writing. Some research unearthed this crate that provides a lot more functionality. Be careful though, there is another crate called date_time
that does similar stuff. The datetime
library here provides structures for representing a calendar date and time of day in both timezone and local form. It also provides formatting functions that convert a date or time into a string and back again, but unfortunately no documentation on how to use it. The format
function takes a time value but also takes a locale of type Time
but no information on how to generate that. I couldn't figure it myself.
My go-to date and time crate that I use is chrono
. It is fully featured, efficient and better documented. You can do UTC, local and fixed offset times (times in a certain timezone). It can format to and parse from strings containing times and dates. Times also have nanosecond accuracy. Times and dates are complicated things and chrono
is a useful weapon in your coding arsenal.
bitflags
This is a very useful crate for generating enums that are bitmasks. You often see this in low-level programming or when dealing the operating system interfaces. Unlike normal enums, bitflag enums can be combined using logical operators. This is trivial to do in C but is not supported in Rust. In C, you could do something like:
typedef enum Flags {
A = 0x01,
B = 0x02,
C = 0x04,
ABC = A | B | C,
};
int ab = A | B;
In Rust, enums are distinct and cannot be combined like that. With bitflags
you can:
use bitflags::bitflags;
bitflags! {
struct Flags: u8 {
const A = 0b001;
const B = 0b010;
const C = 0b100;
const ABC = Self::A.bits | Self::B.bits | Self::C.bits;
}
}
let ab = Flags::A | Flags::C;
Not as eloquent as C, but at least the enumerations are scoped like C++'s enum class
. However, Rust's bitflags
crate does support set difference using the -
operator. This would go very wrong in C and C++ as -
would be treated as a normal integer subtract. For example:
let ac = Flags::ABC - Flags::B;
let we_dont_want_b = ac - Flags::B;
would do the right thing. The equivalent code in C would not.
byteorder
This simple crate allows reading and writing values in little-endian or big-endian byte order. The standard library does have some support with the to_le_bytes
et al. family of functions on the integer primitive types so a lot of this crate is redundant now. Where this crate is useful is with implementing the Read
and Write
interfaces.
If you're wondering what endian means, it refers to how computers store numeric values that require more than a byte to store. For example, with u32
it takes 4 bytes of storage. There are 2 conventional ways of storing this value. You could put the high bytes first (big-endian) or the low bytes first (little-endian) into memory. So for example, the number 42
could be stored as the bytes 42, 0, 0, 0
or the bytes 0, 0, 0, 42
. Most modern CPUs, by default, support the former, which is little-endian. However, data that goes over the network is usually big-endian. So these routines are critical for putting the data in the correct form. There is also a third convention called native-endian
that is either little or big depending on the CPU's preferred form.
git2
This crate offers bindings over the C-based libgit2
library. exa
uses this to implement its .gitignore
support. This is a large crate and way beyond the scope of this article.
glob
One of the main obvious jobs of exa
is to iterate over all the files in a directory. glob
does this with the bonus that you can use the wildcards *
and **
. It provides a single function glob
that takes a file pattern and gives back an iterator returning paths. For example:
// Read in all the markdown articles under all folders in my blogs folder.
use glob::glob;
for entry in glob("~/blogs/**/*.md").unwrap() {
match entry {
// path is a PathBuf
Ok(path) => println!("{}", path),
// e is a GlobError
Err(e) => eprintln!("{}", e),
}
}
Of course, being an iterator, you can run it through all the iteration operators, such as filter
, map
etc. But there is no need to sort since paths are yielded in alphabetical order.
lazy_static
Now this is a crate that is often used. Normally, static variables cannot have run-time calculated values. Try this:
fn main() {
println!("Number = {}", MY_STATIC);
}
static MY_STATIC: u32 = foo();
fn foo() -> u32 {
42
}
You will be greeted with the error:
error[E0015]: calls in statics are limited to constant functions, tuple structs and tuple variants
--> src/main.rs:5:25
|
5 | static MY_STATIC: u32 = foo();
| ^^^^^
Using lazy_static
it becomes:
use lazy_static::lazy_static;
fn main() {
println!("Number = {}", *MY_STATIC);
}
lazy_static! {
static ref MY_STATIC: u32 = foo();
}
fn foo() -> u32 {
42
}
There are three main changes to the code. Firstly, there's the lazy_static!
macro to wrap the static declarations. Secondly, there's an added ref
keyword. The statics returned here are references to your type. Using them invokes the Deref
trait. This means that thirdly, I had to dereference it so that the Display
trait was detectable for u32
. In Rust, Deref
is not invoked when looking for traits so I had to do it manually.
libc
There is a vast sea of C code out there implementing many useful libraries. To speed up Rust's adoption, it was required not to rewrite many of these libraries in Rust. Fortunately, the designers of Rust realised that and made it easy to interoperate with C. libc
provides more support to interoperate with C code. It adds type definitions (like c_int
), constants and function headers for standard C functions (e.g. malloc
).
locale
This crate is documented as mostly useless as it is being rewritten for its version 0.3
. This provides information on how to format numbers and time.
use locale::*;
fn main() {
let mut l = user_locale_factory();
let numeric_locale = l.get_numeric().unwrap();
println!(
"Numbers: decimal sep: {} thousands sep: {}",
numeric_locale.decimal_sep, numeric_locale.thousands_sep
);
let time = l.get_time().unwrap();
println!("Time:");
println!(
" January: Long: {}, Short: {}",
time.long_month_name(0),
time.short_month_name(0)
);
println!(
" Monday: Long: {}, Short: {}",
time.long_day_name(1),
time.short_day_name(1)
);
}
This outputs:
Numbers: decimal sep: . thousands sep: ,
Time:
January: Long: January, Short: Jan
Monday: Long: Mon, Short: Mon
Mmmm... there seems to be a bug at the time of writing. Surely time.long_day_name(1)
should return Monday
and not Mon
. Whether this is an operating system issue or a problem with locale
, I am not sure.
log
This crate provides an interface for logging. The user is expected to provide the implementation of the logger. This can be done through other crates such as env_logger
, simple_logger
and few other crates. log
is not used directly by exa
itself, but rather some of its dependencies.
Essentially, it provides a few macros such as error
and warn!
to pass formatted messages to a logger. There are multiple levels of logging and they range from trace!
to error!
in order of rising priority:
- trace!
- debug!
- info!
- warn!
- error!
I think this crate is missing a fatal!
as most logging systems contain these levels.
Log messages can be filtered by log level, with the lowest level restricting more lower-level messages and the highest level showing all messages. This is set using the set_max_level
function. By default, it is set to Off
and no messages are sent to loggers. Levels can also be set at compile-time using various features. All of this is described in the documentation.
Loggers implement the Log
trait and users install them by calling the set_logger
function.
How should you use the logging levels? Below I provide some opinionated guidance:
Trace
Very fine-grained information is provided at this level. This is very verbose and high traffic. You could use this to annotate each step of an algorithm. Or for logging function parameters.
Debug
Used for everyday use and diagnosing issues. You should rarely submit code that outputs to debug level. At the very least it shouldn't output in release builds.
Info
The standard logging level for describing changes in application state. For example, logging that a user has been created. This should be purely informative and not contain important information.
Warn
This describes that something unexpected happened in the application. However, this does not mean that the application failed and as a result work can continue. Perhaps a warning could be a missing file that the application tried to load but does not necessarily require it for running (e.g. a configuration file).
Error
Something bad happened to stop the application from performing a task.
matches
Allows a check to see if an expression matches a Rust pattern via a macro:
// Macro version
let b = macros!(my_expr, Foo::A(_));
// is the same as:
let b = match my_expr {
Foo::A(_) => true,
_ => false,
}
// or even:
let b = if let Foo::A(_) = my_expr { true } else { false };
It also provides assert versions as well. It is just a small convenience crate.
natord
If you were to sort these strings using normal method: ["foo169", "foo42", "foo2"]
, you would get the sequence ["foo169", "foo2", "foo42"]
. This might not be the order you would prefer. What you might want is sometimes referred to as normal ordering. You might prefer the order ["foo2", "foo42", "foo169"]
where the numbers in the strings increase in value and not by ASCII ordering.
This functionality is what natord
provides. Natural ordering can work well for filenames with numbering inside them and IP addresses to name just a couple. For example, natural ordering will handle strings where the numbers are in the middle (e.g. "myfile42.txt") and not just at the end.
I am sure I will make use of this crate at some point in the future.
num_cpus
Short and sweet this one. It provides a get()
function to obtain the number of logical cores on the running system, and get_physical()
function to obtain the number of physical cores. Very useful if you want to set up a worker thread pool.
number_prefix
This crate determines the prefix of numerical units. Given a number, it determines whether there should be a prefix such as "kilo", "mega", "K", "M" etc., or not. It can even handle prefixes that describe binary multipliers such as 1024, something that many programmers will appreciate, but a prefix like K
will not be used. Rather Ki
will be used.
And given a prefix, it can be converted into its full name. For example, the prefix for 1000 would be K
, but call upper()
on it and you will get KILO
, call lower()
for kilo
and caps()
for Kilo
.
This is very useful for listing file lengths as exa
is required to do.
pad
This crate is used for padding strings at run-time. The standard library format!
function can do a lot of padding functionality but this crate can do more.
For example, it can add spaces at the front to right-align a string within a set field width:
let s = "to the right!".pad_to_width_with_alignment(20, Alignment::Right);
Neat huh?
You don't have to pad with spaces either. It can use any character you wish. It will also truncate strings that are too long for a particular width.
If you need to format the textual output on a terminal, you may need to look this crate up.
scoped_threadpool
This can produce a struct that manages a pool of threads. These threads can be used to run closures that can access variables in the original scope. This is useful because normally threads can only access values of 'static
lifetime or are entirely owned inside the thread.
Let us look at the example in the documentation.
fn main() {
// We create a pool of 4 threads to be utilised later.
let mut pool = Pool::new(4);
// Some data we can do work on and reference to.
let mut vec = vec![0, 1, 2, 3, 4, 5, 6, 7];
// Use the pool of threads and give them access to the scope of `main`.
// `vec` is guaranteed to have a lifetime longer than the work we
// will do on the threads.
pool.scoped(|scope| {
// Now we grab a reference to each element in the vector.
// Remember `vec` is still around during `pool.scoped()`.
for e in &mut vec {
// Do some work on the threads - we move the reference
// in as its mutably borrowed. We still cannot mutably
// borrow a reference in the for loop and the thread.
scope.execute(move || {
// Mutate the elements. This is allowed as the lifetime
// of the element is guaranteed to be longer than the
// work we're doing on the thread.
*e += 1;
});
}
});
}
This crate allows us to create works on threads that we know will not outlive other variables in the same scope. pool.scoped
must block until all work is done to allow this to happen.
This is very useful for quickly doing short-lived jobs in parallel.
term_grid
To produce standard ls
output, exa
must show filenames in a grid formation. Given a width, this crate provides a function fit_into_width
that can help to produce a grid of strings. By working out the longest string in a collection, it can calculate how many of those strings can fit in horizontal space, like say, the line on a terminal.
term_size
Very simple but crucial if you want to provide textual output on a terminal in a highly formatted way. It uses ANSI control codes to communicate with your terminal to figure out the size of your terminal view.
let (width, height) = match term_size::dimensions() {
Some((width, height)) => (width, height),
None => (80, 25),
}
The terminal, possibly, may not report the dimensions (although most do) and so the result of dimensions()
is an Option<(usize, usize)>
.
tinyvec
This mainly provides 2 vector types: ArrayVec
and TinyVec
.
ArrayVec
is a safe array-backed drop-in replacement for Vec
. This means that a fixed array is allocated to act as storage for ArrayVec
. In any other way, ArrayVec
acts like a Vec
but it cannot reallocate that storage. If the array becomes too big a panic will occur.
This is very useful to avoid reallocations all the time. ArrayVec
even allows access to the unused space in the backing array that is currently being used for elements.
Because ArrayVec
uses a fixed array, a heap allocation does not occur.
TinyVec
is a drop-in replacement for Vec
too. However, for small arrays, it starts life as an ArrayVec
using a fixed array as the backing store. As soon as it becomes too big, it will automatically revert to use a normal Vec
, hence using the heap to store the array.
This requires the alloc
feature to be activated.
It is basically an enum that can be an ArrayVec
or a Vec
:
enum TinyVec<A: Array> {
Inline(ArrayVec<A>),
Heap(Vec<A::Item>),
}
You have got to love algebraic types! You have to provide the backing store type and a macro helps you do this:
let my_vec = tiny_vec!([u8; 16] => 1, 2, 3);
This example can only store the first 16 elements on the stack before it switches to a more regular Vec<u8>
.
unicode-width
Unicode characters can be wide, and on the terminal, it's important to know how many characters a Unicode character or string might take up. There is a standard for working out that information and it is called the Unicode Standard Annex #11.
This is very useful when displaying filenames with international characters, which is exactly what exa
needs to deal with. Fortunately, there is a crate that can provide that information.
The documentation for unicode-width
does remind us that the character width may not match the rendered width. So be careful but I don't think there is much you can do in these situations.
url
A typical URL can code much information including:
- the protocol scheme (e.g.
http
) - the host (e.g.
docs.rs
) - a port number (e.g. the number in
http://docs.rs:8080
) - a username and password
- a path (e.g.
foo/bar
in the URLhttp://docs.rs/foo/bar
) - a query (e.g. everything after the
?
inmyurl.com/foo?var=value
)
A lot of these are optional and so a URL parsing library will need to handle this too.
This crate provides a Url::parse
function that constructs an object describing the various parts of an URL. Various methods can then be called to provide string slices into the URL such as scheme()
or host()
.
This crate also provides serde
support too so if you know what that is, you will understand what this means. Maybe I will write about serde
in a future article.
users
This is a very Unixy thing and so is useful on Mac OSX too. To handle various permissions in a Unix operating system, a process is assigned an effective user ID. This user ID and the various group IDs the user belongs to determines the permissions a process has within the operating system, including file and directory access.
The users
crate provides this functionality by wrapping the standard C library functions that can obtain user and group information.
If you have trace level logging on, this crate will log all interactions with the system.
This crate also has a mocking feature to use pretend users and groups that you can set up for purposes of development. You do not necessarily want to access real IDs.
Even though this is a very Unixy thing, Windows does have similar permission features. So this crate should work on Windows too. Although I haven't tested this and the documentation does not say so. But I have seen exa
work on Windows so I am assuming.
zoneinfo_compiled
This crate seems to get information directly from the operating system via zoneinfo
files. This information allows you to obtain information on leap seconds and daylight savings time for your current time zone.
This information is maintained by a single person and distributed and stored on your harddisks if you use Unix-based systems. You can find this data in the /usr/share/zoneinfo
folder. Each timezone has a binary file and this crate can parse this file and extract the information it holds.
The whole proper handling of time and time zones is beyond me at this moment in time. So I am not sure how exa
would be using this crate. It's a very complex topic and something I would love to dig in deeper with and hopefully not go insane at the same time.
Another name for the database is the tz database
and you can find more information about it, if you so desire, at Wikipedia.
Conclusion
I hope you enjoyed this little trip into a few crates used by a single project. I encourage all of you to try this simple exercise. I learnt a lot by researching for this article and I still feel I have not even scratched the surface.
Please write in the discussion below about interesting crates that you have found and used. I would love to hear about them.
Until next time!
Posted on July 24, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.