Learn Assembly in 3 Minutes ⏳💻
Bruno Ciccarino λ
Posted on November 21, 2024
Hey there, fellow curious mind! 👋 Did you know that the word computer comes from the Latin computare, which means to calculate? 🧠 A computer is basically a hyper-fast calculator with fancy tricks. These tricks are powered by a set of interconnected circuits running programs. Cool, right? 😎
But here’s the exciting part: today, we’re diving into x86-64 Assembly Language, using the NASM assembler! 🛠️ Assembly is the ultimate way to speak to your CPU, giving you complete control over your computer. While it’s not the easiest language, it’s definitely the most rewarding for understanding what’s happening under the hood. Let’s get started! 🚀
Now, let’s go a step further—every modern computer you’ve ever used is based on something called the von Neumann architecture. It’s got four major parts:
- 1️⃣ ALU (Arithmetic Logic Unit): Does the number crunching and logical operations. 🧮
- 2️⃣ Control Unit: The “manager” that keeps everything running smoothly. 🕹️
- 3️⃣ Memory: Stores your data and programs. 🧠
- 4️⃣ Input/Output devices: These let you talk to your computer and vice versa (e.g., keyboard, monitor). 📠
All these parts are connected by highways of data called buses. 🚌
Alright, that’s enough history—let’s dive into the nitty-gritty and learn Assembly Language, the secret sauce of speaking to your CPU directly. Let’s gooo! 🚀
Step 1: Setting Up Your Toolkit 🛠️
Windows
- Download the NASM assembler from the official site.
- Install it and add it to your system PATH so you can call nasm from anywhere.
- Use a text editor (like NeoVim) to write your .asm or .s files.
Linux
- Open your terminal.
- Install NASM by running:
sudo apt update && sudo apt install nasm
- You're all set to write .asm files using any text editor!
Step 2: Learn the Basics 📖
Sections:
A nasm assembly code is divided into some sessions/segments (section, segment), where each one has its own responsibility.
- in the section .data session, this will be where your variables will be declared
- in section .bss session, uninitialized variables (with fixed size)
- Finally, the section .text section is where your executable assembly code will be.
Now, how could I create a simple variable in assembly?
in the .data session, with the following structure:
name size value1, value2, ..., valueN
in assembly, variables work as if they were an array, so you can attach 1 value or N values to it, and consequently, access the values with an index, just like an array.
- name: you choose size:
- db: define byte. Makes each chunk occupied by the variable be exactly 1 byte (or 8 bits)
- dw: define word. Each chunk occupies exactly 2 bytes (or 16 bits)
- dd: defines double-word (4 bytes/32 bits)
- dq: defines quad-word (8 bytes/64 bits)
the value needs to fit the directive you chose. For example, an integer fits in 1 byte, but a float (3.14) does not fit. So use dw up
Registers: Your CPU's Pocket Notebook 📓
Registers are like boxes where you store values, and these values can jump from register to register. Each register has its size in bits, and depending on the value you try to put in the register, it is like a box with the size for a certain item. If it is bigger than the box, it doesn’t go in
- RAX: General-purpose accumulator. Often used for return values.
- RBX: Another general-purpose register.
- RCX/RDX: Used for counters and data handling.
- RSI/RDI: For string operations.
- RSP/RBP: Stack Pointer and Base Pointer for managing the stack.
- rax: re-extended ax -> 64 bits
- eax: extended ax -> 32 bits
- ax -> 16 bits
- al: a low (least significant bit) -> 8 bits
- ah: a high (most significant bit) -> 8 bits
Global Labels: global _start Explained
When writing assembly programs, we need to tell the assembler where the program starts. That’s what global _start does—it makes the _start label available to the linker.
- global: This keyword tells NASM that _start is the entry point, making it accessible to the OS when the program is executed.
- _start: The default entry point for our program. When you run the program, execution begins here.
Think of _start as the doorway into your program—it’s where the CPU starts reading and executing instructions. Without it, your program would have no clear starting point, and the OS wouldn’t know what to do!
System Calls: How the CPU Talks to the OS 📞
Syscalls are like knocking on the operating system’s door to ask for services like printing or reading input. Each syscall has a number. For example:
0 (read): Read an input
1 (write): Write something to the screen.
2 (open): Open a file
60 (exit): Terminate the program.
And to know how to use system calls correctly we need to know how to properly use the registers and the order in which they will appear, system calls can have up to 6 parameters, in addition it has the system call identifier that will actually indicate what the call is we want to do, the identifier must be placed in the rax register, the first parameter in rdi, the second parameter in rsi, the third parameter in rdx, the fourth in r10, the fifth on r8 and the sixth on r9.
What we want to do in our first program is to print a message on the screen "Hello world" so we will use the system call sys_write. And as seen above, the identifier of this system call is 1, so the value we will put in our rax
will be 1. To use the sys_write system call, we need to define three parameters, rdi, rsi and rdx. The rdi can have three values: 0 which is the standard input, 1 which is the standard output and 2 which is the standard error, used in case of an error, we will use the standard output so it will be 1 in our case.
Step 3: Write and Run Your First Program ✍️💻
Here’s a simple Assembly program that prints "Hello, world!" and exits.
Code: hello.asm
; Assembly instructions can be used to:
;; Perform mathematical and logical actions;
;; Perform data movement in the processor;
;; Perform data/information input and output operations;
;; Perform conditional branch control and loop execution.
; There are two types of registers:
;; General registers;
;; Special purpose registers (segment, offset and state).
section .data
message db "Hello world" ; db stands for define bytes
; This message will be a pointer to memory and through this pointer I can retrieve the content, which in this case is the message hello world
section .text
global _start ; By declaring a label we can indicate to the assembler that we want to jump to the part of the code where this label is located
; this _start label must be present in all nasm programs
_start: ; This is where the program begins
mov rax, 1 ; identifier of the sys_write call
mov rdi, 1 ; standard output
mov rsi, message ; pointer pointing to the pointer we defined above
mov rdx, 12 ; The number of characters in our message, that is, its size in bytes
syscall ; calling the syscall so that the assembler defines in machine code that executes the instructions we defined above
;; now we can end our program, calling the sys_exit syscall that we saw above, it accepts a parameter that is the error code:
;; 0 - everything is fine
;; 1 - something went wrong
mov rax, 60 ; identifier of the sys_exit call
mov rdi, 0 ; indicates that everything went well
syscall ; calling the syscall again to end our program```
### Assemble and Run
#### On Linux:
Assemble the program:
```bash
nasm -f elf64 hello.asm
Link it:
ld -o hello hello.o
Run it:
./hello
On Windows:
Assemble the program:
nasm -f win64 hello.asm -o hello.obj
Link it using a linker like GoLink or MinGW:
golink /console /entry:_start hello.obj
Run it:
hello.exe
OK! It ran, but I noticed that it didn't break the line, but this should be easy to solve, just put a \n in the message db "Hello world\n"
and it should solve it, right? No young grasshopper, the assembly uses the default encoding of the ascii table and in the ascii table, the new line is known as line feed, which is character number 10 in the ascii table, so we have to show the assembler that we want to skip the line , we can add a 0xa at the end of the message, 0xa is ten in hexadecimal, this way the assembler will understand that you want to skip a line. the syntax would be exactly this:
section .data
message db "Hello world", 0xa
and now repeat the entire assembly process and execute, your output should be this:
And can I write hello world just with the ascii table? Yes, you can, but your code would be unreadable, I don't recommend doing it, but it would look like this if written using just the ascii table:
section .data
message db 72, 101, 108, 108, 111, 0xa
As we can see in the ascii table, position 72 has the h, 101 has the e, 108 has the l so we will repeat 108 twice, and 111 has the "o".
and the output would look like this:
Step 4: Bonus: A Read and Print Example 📖✏️
Let’s read user input and print it back!
Code: echo.asm
section .bss
input resb 32 ; Reserve 32 bytes for input
section .text
global _start
_start:
; Read input
mov rax, 0 ; Syscall number for read
mov rdi, 0 ; File descriptor (0 = stdin)
mov rsi, input ; Buffer for input
mov rdx, 32 ; Max number of bytes to read
syscall
; Write input back to stdout
mov rax, 1 ; Syscall number for write
mov rdi, 1 ; File descriptor (1 = stdout)
mov rsi, input ; Address of the input buffer
syscall
; Exit
mov rax, 60 ; Syscall number for exit
xor rdi, rdi ; Exit code 0
syscall
And that’s it! 🎉 You’ve just learned the basics of x86-64 Assembly! Keep experimenting, and soon you’ll feel like a wizard casting CPU spells. 🧙♂️✨
Posted on November 21, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.