Let's build a WebAssembly compiler and runtime - WebAssembly Text Format
Thomas Albertini
Posted on November 15, 2022
Today I want to start a series of articles on how I've managed to build a WebAssembly compiler.
Luna is a really tiny compiler written mainly as a quest to conquer the WebAssembly dungeon.
If you do not know what I'm talking about I suggest to read the introductory article.
Long story short, some weeks ago I've decided to build my own Web Assembly compiler.
This was quite the challenge.
First of all what the heck is WAT??
Second problem: I didn't know how to write a compiler.
So here I am trying to dissect my journey and the informations I've acquired while writing Luna.
Today I'll give you an overview of the WebAssembly Text Format (the thing we are going to compile).
WAT (WebAssembly Text Format)
The WebAssembly text format is a textual representation of the WASM binary format.
Let's say we have the following .wat
file:
(module
(func (export "add") (param i32) (param i32) (result i32)
local.get 0
local.get 1
i32.add)
)
What the code above does is simple (and quite intuitive): it exports a function aliased as "add" that takes two arguments and returns the sum of them.
Easy.
So let's analyze each word.
Module
A module is the fundamental unit of code in WebAssembly and it is loaded by a WASM runtime. In textual format a module is represented as a big S-Expression which Wikipedia defines as:
an expression in a like-named notation for nested list (tree-structured) data.
NOTE: the shortest WASM program you can write is (module)
which does absolutely nothing lol, but it is still a valid WASM program.
Inside our module above there are two structures:
Func: the first structure is a function declared by the func keyword.
Export: the second structure is an export declared by the export keyword
The function has:
- An identifier
$add
that is the name of the function Two parameters
$num1
e$num2
of type i32 (WebAssembly provides only four basic number types i32, i64, f32, f64)A result of type
i32
.Three instructions
local.get 0
,local.get 1
,i32.add
(to understand them we need to first learn about Stack Machines)
Stack Machines
WASM execution is defined in terms of stack machines. The idea behind of a stack machine is that every instruction is executed in order and either pushes or pops a number (i32/i64/f32/f64) from a stack.
There are basically two types of instructions:
-
Simple instructions
(e.g. i32.add, f32.sub etc...): generally pop arguments from the stack and push the result back on it.-
Control instructions
: alter the control flow (we won't be seeing them in this series).
-
(func (param $n i32) (result i32)
local.get $n
local.get $n
i32.add)
For example local.get
takes the param $n
and pushes it onto the stack, i32.add
instruction adds them (or better, adds all the elements present in the stack) and pushes the result onto the stack.
So, I hope I've give you an idea of how to start writing your own WAT modules. They ain't that scary, are they?
But there's one last thing I want to show before we get to the code.
At the beginning of the article I've said that WebAssembly Text Format is a textual representation of the WASM binary format, but how does a WASM binary format look like?
WASM Binary Format
Our .wat
example
(module
(func (export "add") (param i32) (param i32) (result i32)
local.get 0
local.get 1
i32.add)
)
would look like this (compiled with wat2wasm)
0000000: 0061 736d ; WASM_BINARY_MAGIC
0000004: 0100 0000 ; WASM_BINARY_VERSION
; section "Type" (1)
0000008: 01 ; section code
0000009: 00 ; section size (guess)
000000a: 01 ; num types
; func type 0
000000b: 60 ; func
000000c: 02 ; num params
000000d: 7f ; i32
000000e: 7f ; i32
000000f: 01 ; num results
0000010: 7f ; i32
0000009: 07 ; FIXUP section size
; section "Function" (3)
0000011: 03 ; section code
0000012: 00 ; section size (guess)
0000013: 01 ; num functions
0000014: 00 ; function 0 signature index
0000012: 02 ; FIXUP section size
; section "Export" (7)
0000015: 07 ; section code
0000016: 00 ; section size (guess)
0000017: 01 ; num exports
0000018: 03 ; string length
0000019: 6164 64 add ; export name
000001c: 00 ; export kind
000001d: 00 ; export func index
0000016: 07 ; FIXUP section size
; section "Code" (10)
000001e: 0a ; section code
000001f: 00 ; section size (guess)
0000020: 01 ; num functions
; function body 0
0000021: 00 ; func body size (guess)
0000022: 00 ; local decl count
0000023: 20 ; local.get
0000024: 00 ; local index
0000025: 20 ; local.get
0000026: 01 ; local index
0000027: 6a ; i32.add
0000028: 0b ; end
0000021: 07 ; FIXUP func body size
000001f: 09 ; FIXUP section size
or like this (compiled with Luna
As you can see, each module is divided in sections and each section has its own rules, there's the MAGIC WORD, a section for the function body, a section for the code, a section for the function type and whatnot...
Do not worry,
we will be tackling them all and we will conquer this WebAssembly dungeon.
Conclusion
Thank you for the reading, I hope you've enjoyed this and if you want to go deeper in the explanation I will leave some useful resources below. See ya in the next article!!
Posted on November 15, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.