Szymon Gibała
Posted on March 1, 2023
... or There and Back Again
Returning to Rust
As in every hero's journey, after gaining the wisdom of the gods, the hero shall bring it back to his roots. If we squint our eyes a bit, that is (kind of) what we are going to do.
Our exploration in the previous post revealed some suspects that blow up our syscalls count in a simple "Hello, world" Rust application, those are:
- Rust runtime
libc
Let's try getting rid of them.
No std
As a first step of our journey back to Rust, we need to cut some fat out of Rust runtime. We can do it by removing the standard library with no_std
attribute. Our minimal "Hello, world" program will look like this:
#![no_std]
#![no_main]
use libc;
const MSG: &'static str = "Hello, world!\n";
#[no_mangle]
pub extern "C" fn main(_argc: isize, _argv: *const *const u8) -> isize {
unsafe {
libc::printf(MSG.as_ptr() as *const _);
}
0
}
#[panic_handler]
fn panic(_: &core::panic::PanicInfo) -> ! {
loop {}
}
Since there is plenty of good resources about no_std
Rust, let's just get a quick overview of what is happening here:
-
#[no_std]
attribute tells Rust to not link the standard library. - Since it is the standard library that defines
panic_handler
for Rust programs, without it we need to define our own using#[panic_handler]
attribute on a function with a proper signature. - On the same note we cannot use Rust's default
main
function as an "entry point" to our program, so we need to use#![no_main]
attribute and provide our ownmain
function.#[no_mangle]
attribute tells the compiler to not change the name of our function, so it can be found (and called) bylibc
. - And finally, we do not have access to the
println
macro, so we uselibc::printf
instead.
For our Cargo.toml
we specify libc
as a dependency and tell the compiler that we want our program to abort on panic. This second piece is necessary again because we do not use std
and we are building a program for a target where eh_personality
(eh
stands for "exception handling") is defined in the standard library. Since eh_personality
is necessary for stack unwinding when a panic occurs, aborting absolves us from the need to provide it:
...
[profile.dev]
panic = "abort"
[profile.release]
panic = "abort"
[dependencies]
libc = { version = "0.2.139", default-features = false }
You can try it for yourself by removing
panic = "abort"
lines. Rust compiler error messages would point you in the right direction.
and voila, it works:
cargo run
Hello, world!
However... In respect of your screen real estate, I am not even going to bother pasting strace
output here as it is bloated like a dead whale (ok, maybe not that bad just 35 syscalls...). This is because we still use the printf
function, libc
, and on top of that link it dynamically...
Statically linking libc
(especially musl) to no_std
program turns out to be not a trivial task, and since we need to get rid of it anyway, let's not go down this rabbit hole. Let's instead get rid of it altogether.
No libc
Okay, we can remove libc
from our dependencies, remove calls to printf
that depend on it, and we are good. Right?
#![no_std]
#![no_main]
const MSG: &'static str = "Hello, world!\n";
#[no_mangle]
pub extern "C" fn main(_argc: isize, _argv: *const *const u8) -> isize {
0
}
#[panic_handler]
fn panic(_: &core::panic::PanicInfo) -> ! {
loop {}
}
Yes... but well, no...
cargo run
...
/usr/bin/ld: /usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0/../../../../lib/Scrt1.o: in function `_start':
/build/glibc/src/glibc/csu/../sysdeps/x86_64/start.S:115: undefined reference to `__libc_start_main'
collect2: error: ld returned 1 exit status
...
(cut down to only relevant parts)
We have at least two problems here. The first one is what we see on the screen -- a beautiful linker error -- and the second one is what we do not see on the screen -- our "Hello, world!" message -- because we removed the printf
function call.
Since tackling the first one is a prerequisite for the second, let's start with it.
As I mentioned earlier, with our #[no_std]
binary we had to provide a custom main
function that is called by libc
. But now that we do not have libc
there is nothing to call our main
function...
However, as we can see in the error message above something still refers to __libc_start_main
function, which reasonably so cannot be found. We can see that error originated in the Scrt1.o
file in _start
function. Scrt1.o
is a part of the C runtime startup code so we can reason that Rust still tries to link it to our binary.
Since this is not really the problem with our code, but more with the build (linking) process, we need to tell the compiler to not link those files, and we can do it by passing -nostartfiles
flag to the linker.
RUSTFLAGS="-C link-arg=-nostartfiles" ...
In
gcc
docs we can read:-nostartfiles
Do not use the standard system startup files when linking. The standard system libraries are used normally, unless -nostdlib, -nolibc, or -nodefaultlibs is used.
To not specify RUSTFLAGS
every time, we can move them to .cargo/config.toml
:
[target.'cfg(target_os = "linux")']
rustflags = ["-C", "link-args=-nostartfiles"]
And we are good to go!
cargo run
[1] 1434 segmentation fault (core dumped) cargo run
I will take that as a no...
Remember the _start
function, right? Well, since we no longer link startup files it is no longer here (surprise!), and apparently, it is "kind of" needed.
When OS loads the program it will look for the entry point address in the ELF file header to start the execution. However, if the entry point function was not found during the linking process, the address will be set to 0x0
(NULL), which usually is a protected memory area.
readelf -h ./target/debug/hello-world | grep Entry
Entry point address: 0x0
The _start
function is a default expected by the linker. This means we could simply rename our main
function to _start
... Or we can convince the linker that our function is better! We can use the same trick as before -- providing a linker configuration -- this time by passing --entry
flag as a link-arg
:
RUSTFLAGS="-C link-arg=--entry=main" cargo run
[1] 2022 segmentation fault (core dumped) RUSTFLAGS="-C link-arg=--entry=main" cargo run
😑 ...
Is the entry point set?
readelf -h ./target/debug/hello-world | grep Entry
Entry point address: 0x1020
Looks like it is, but... Remember our Assembly program? On top of write
we also used the exit
syscall, and without it, the program would segfault too. We might be facing a simillar issue here.
We can take a closer look at this by checking the Assembly code generated by the Rust compiler. To do that new need to expand our RUSTFLAGS
, this time with --emit=asm
flag:
RUSTFLAGS="-C link-arg=--entry=main --emit=asm" cargo build --release
We are building in release
mode this time to cut out debug symbols noise from the Assembly file. With that, we can find a concise .s
file in the target/release/deps
directory, and see our main
function:
.text
.file "hello_world.57a4ccb5-cgu.0"
.section .text.main,"ax",@progbits
.globl main
.p2align 4, 0x90
.type main,@function
main:
retq
.Lfunc_end0:
.size main, .Lfunc_end0-main
...
The function contains only a single retq
instruction, no syscalls, and no exit codes. To get even closer to the binary we can disassemble main
directly, and confirm what we have already seen:
objdump --disassemble=main ./target/release/hello-world
./target/release/hello-world: file format elf64-x86-64
Disassembly of section .text:
0000000000001000 <main>:
1000: c3 ret
Since we are yet to research how to print to the screen without libc
, let's confirm the hypothesis by doing something that we can empirically detect in our code. We can do it by calling panic!
, or simply adding an endless loop:
Since that is exactly the behavior of our
panic_handler
the result is effectively the same.
...
#[no_mangle]
pub extern "C" fn main(_argc: isize, _argv: *const *const u8) {
loop{}
}
...
RUSTFLAGS="-C link-arg=--entry=main --emit=asm" cargo run
And now we are stuck, which is what endless loops usually do. We can update .cargo/config.toml
with our new link-arg
and move on:
[target.'cfg(target_os = "linux")']
rustflags = ["-C", "link-args=-nostartfiles --entry=main"]
We know the drill now from our previous adventures, we just write(2)
and exit(2)
, and we are done! It might be time to reach out to some old "friends"...
Assembly. Again...
Remember the "wisdom of the gods" part? Yeah, that was not (entirely) a joke.
Since we already have tremendous experience with assembly after writing our "Hello, world" program, it would be a shame not to use it again... You might be asking, "Am I cheating once more?". Maybe. Or no, because it is me who made up those rules [evil laugh or something...].
Regardless, this time we are going to use Assembly from Rust (see, it is not cheating!). Fortunately, we can do that fairly easily, all we need is the asm!
macro:
use core::arch::asm;
...
#[no_mangle]
pub extern "C" fn main(_argc: isize, _argv: *const *const u8) {
unsafe {
// Execute write syscall
asm!(
"syscall",
in("rax") 1, // write syscall number
in("rdi") 1, // stdout file descriptor
in("rsi") MSG.as_ptr(),
in("rdx") MSG.len(),
);
// Execute exit syscall
let exit_code = 0;
asm!(
"syscall",
in("rax") 60,
in("rdi") exit_code,
options(noreturn)
);
}
}
...
And run it:
cargo run
Hello, world!
Amazing! Let's just confirm with strace
, and we are done...
strace -c ./target/debug/hello-world
Hello, world!
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
0.00 0.000000 0 1 write
0.00 0.000000 0 1 mmap
0.00 0.000000 0 1 mprotect
0.00 0.000000 0 1 brk
0.00 0.000000 0 1 1 access
0.00 0.000000 0 1 execve
0.00 0.000000 0 2 1 arch_prctl
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 1 set_robust_list
0.00 0.000000 0 1 rseq
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 0 11 2 total
Ah, yes, of course... A quick look at the output of:
strace ./target/debug/hello-world
Can refresh some memory pages in my head...
...
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
...
ld.so.preload
again, isn't it? We never actually got to build our no_std
binary statically, so it is still dynamically linked.
file ./target/debug/hello-world
./target/debug/hello-world: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=ca87b904fa8cd9cb232c819143edd5abb16cdaa7, with debug_info, not stripped
Been there done that, we know what to do, and after quick facepalm, we can again set target-feature=+crt-static
, this time in .cargo/config
:
[target.'cfg(target_os = "linux")']
rustflags = ["-C", "link-args=-nostartfiles --entry=main", "-C", "target-feature=+crt-static"]
And run it again:
cargo build && strace -c ./target/debug/hello-world
Hello, world!
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
0.00 0.000000 0 1 write
0.00 0.000000 0 1 execve
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 0 2 total
There we have it, mission accomplished! "Hello, world!" program with just a single syscall, written in Rust (kind of).
It still hurts to look at tho. Perhaps we could use some library to have nice and smooth Rust functions and let someone more fluent speak Assembly for us... You know, sweep it under the rug type of thing...
Rust is not JavaScript, but there actually is a crate for that. Let's add it to our Cargo.toml
:
[dependencies]
sc = "0.2.7"
Our main
function parameters are useless anyway, and we no longer need pub extern "C"
, so we clean it up as well with our last refactor. The final program looks much nicer:
#![no_std]
#![no_main]
#[macro_use]
extern crate sc;
const MSG: &'static str = "Hello, world!\n";
#[no_mangle]
fn main() {
unsafe {
syscall!(WRITE, 1, MSG.as_ptr(), MSG.len());
syscall!(EXIT, 0);
}
}
#[panic_handler]
fn panic(_: &core::panic::PanicInfo) -> ! {
loop {}
}
Conclusion
Ah, what a journey it was! All that hassle for writing a "Hello, world!" program...
We achieved our goal of cutting it down to a single syscall, but there are still a lot of areas we have just scratched the surface. There is also this kernel thingy that actually performs the action requested by a system call and so on, but that is a story for another day (or maybe a few years worth of stories).
In any case, I hope you enjoyed this little exploration, and maybe even learned a thing or two. If you have any questions, comments, or suggestions, feel free to reach out.
Posted on March 1, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.