Playing with low-level memory allocation in WebAssembly
Igor Proskurin
Posted on September 6, 2023
I wrote a blog post about my first experience with WebAssembly (WASM) recently. And in that post, I touched a little bit how to set up an SDK for writing code in C/C++, and how to compile a simple C++ function that can takes couple of numeric values from JavaScript, run WASM binary in a browser, and return a value.
For those who just jumped in, WebAssembly is cool cross-platform binary format, assembly language, and a Virtual Machine to run this binary in a browser. What can it do? Well, it can mine crypto currency silently in background while you go through your favorite webpages. And guess who pays for the electricity?
Well, besides crypto-currency abuse, it is an interesting technology to run heavy stuff client-side with reasonable performance.
Where to start
In this post, I am playing around with Emscipten. It is a WASM compiler which wraps around clang
to compile C/C++ source code in a binary .wasm
format. It also provides some glue-code API to embed this WASM binary into JavaScript. Just look into MDN Docs and Emscripten SDK to get started.
Managing memory with Emscripten
Before, diving into high-level Emscripten stuff such as Embind, I decided to look into its low-level memory model.
Here is a toy problem. We have a C-function that takes an array of double
precision values, do something with them and return a number. It may look as simple this.
// malloc_testing.c
#include <assert.h>
#include <math.h>
#include <stdio.h>
double vector_norm(double* p, int n) {
int i;
double norm2 = 0;
assert(n > 0 && "number of elements must be positive");
assert(p && "must be a valid pointer");
printf("received: n = %d\n", n);
for (i = 0; i < n; i++) {
printf("processed: p[%d] = %.3f\n", i, p[i]);
norm2 += p[i] * p[i];
}
return sqrt(norm2);
}
In the code, I sprinkled some asserts and old-fashioned print-outs for convenience. Don't forget to wrap it in extern "C" {}
if you are going to treat as C++ code.
We already know that this function can be called from JavaScript using ccall()
or cwrap()
methods, but how can we pass an array from JavaScript to our C-code?
Let us compile this function into a binary using Emscripten emcc
compiler.
$emcc malloc_testing.c -o malloc_testing.js -O0 -sASSERTIONS=2 -sEXPORTED_FUNCTIONS=_vector_norm,_malloc,_free,setValue -sEXPORTED_RUNTIME_METHODS=cwrap,ccall
Here, I tell the compiler to keep the assertions by setting low optimization level -O0
, and export some useful stuff like _malloc
, _free
, and setValue
, and our C-function of course _vector_norm
(note the leading underscore).
Now we have a couple of files: malloc_testing.wasm
that contains a binary, and malloc_testing.js
which is JavaScript glue code that allows us to use it from a web page. You can also run in Node.js, but in this case it should be compiled with -sMODULARIZE
.
Allocating memory from JavaScript
How does memory model of the WASM VM look like? Well, for C/C++ code it look pretty normal: code, heap, stack. We can allocate stuff on the heap and pass pointers around. Luckily for us, we also asked emcc
to export memory allocation, _malloc
, in to the JavaScript glue code, so now we can allocate memory on the heap in WASM from JavaScript.
In theory, the whole process looks easy: allocate memory on the heap and get pointer into JavaScript code, write something into this memory, and pass this pointer to the C-function. Something like that:
Let's try it. I will use a simple web page set up to run our C-function inside the browser by pressing a button.
<!DOCTYPE html>
<html lang="en">
<body>
<button id="mybutton">Run</button>
<script>
document.getElementById("mybutton").addEventListener("click", ()=>{
const vectorNorm = Module.cwrap(
'vector_norm', // no underscore
'number', // return type
['number', 'number']); // param types;
const myTypedArray = new Float64Array([0, 1, 2, 3, 5]);
// allocate empty buffer
let buf = Module._malloc(myTypedArray.length * myTypedArray.BYTES_PER_ELEMENT);
// fill this buffer with our stuff
Module.HEAPF64.set(myTypedArray, buf / myTypedArray.BYTES_PER_ELEMENT);
// call our function and pass pointer to buffer
const result = vectorNorm(buf, myTypedeArray.length);
console.log(`result = ${result}`);
Module._free(buf); // no leaks!
});
</script>
<script src="malloc_testing.js"></script>
</body>
Here, I first create a JavaScript typed array with float64
continuous view of memory.
After that, I create an empty buffer on the heap inside WASM memory by calling to _malloc
that we exported when we compiled our C-file. It returns a pointer buf
to the allocated segment of memory, which in JavaScript code is treated simply as a number
(very "safe", eh?).
Next step is to fill allocated memory with something. I use Module.HEAPF64.set(myTypedArray, buf / myTypedArray.BYTES_PER_ELEMENT)
that takes two arguments: my array, and a pointer to the buffer. Note the alignment! The pointer must count by 8-bytes. It actually took me more than an hour to figure it out since Empscipten API docs are quite, hm, emscryptic on this point. Thanks to ChatGPT and this post.
To see how it works, we can replace the call to HEAPF64.set
by manual allocation. I came up with something like this (don't do it anywhere near production!):
function setMemoryManually(myArray, ptr) {
for (const x of myArray) {
Module.setValue(ptr, x, 'double');
ptr += myArray.BYTES_PER_ELEMENT;
}
}
It looks ugly, but works. Low-level function Module.setValue(ptr, value, 'double')
can be used to manually set a value
at the address pointed by ptr
. In this case, no tricks. The pointer is incremented by BYTES_PER_ELEMENT = 8
for double
. So now I can write something like setMemoryManually(myTypedArray, buf)
in my JavaScript code, and it will fill the buffer with the content of myTypedArray
.
When all memory is set, we can call our C-function from JavaScript. I prefer to wrap it up first.
const vectorNorm = Module.cwrap('vector_norm', // no underscore
'number', // return type
['number', 'number']); // param types;
We tell cwrap
that we return a number
, and we pass a couple of number
values. Yes, the pointer to the buffer of the allocated memory is passed as a number
(looks very "safe" and "portable", eh?). So we can just call vectorNorm
from our script.
const result = vectorNorm(buf, myTypedArray.length);
Last step. Open a browser, serve our http-web page from a local host (I just run python -m http.server
)
http://localhost:8000/wasm_testing/malloc_testing.html
hard reload, press Run
, and here we go
received: n = 5
p[0] = 0.000
p[1] = 1.000
p[2] = 2.000
p[3] = 3.000
p[4] = 5.000
result = 6.244997998398398
In summary...
Playing with low-level stuff is fun, but I won't use it anywhere in productionable code. Well, at least without considerable experience and understanding of the Emscripten code base.
Posted on September 6, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.