Kushal Joshi
Posted on May 15, 2022
We're entering into a kind of niche subject here but I want to make it more accessible if I can. At the time of writing this post, at my work we've been using GRPC for a few years, because at the time when we made the decision, it was simple, fast, had some Rust support, and corporate sponsorship (it used to be that G stood for Google...). It's not that we've reached the end of using GRPC but I keep wondering what else is possible, as our work context requires us to go faster and faster.
Title photo by Emre Karataş on Unsplash
GRPC is a binary RPC protocol that serialises pretty damn quick and transfers data more than fast enough across our global system. We could use something faster but, to date, we've had other fish to fry in the area of optimisation.
However, today I have some personal projects that need a data protocol and my leaning is to go with RPC. I need speed and I want it to be well supported and simple to integrate.
While it would be easy to use GRPC again, I want to try out another RPC protocol - Cap'n Proto.
Why? because the author of Cap'n Proto was one of the original authors of ProtoBuf 2 (Protocol Buffers 2) which is the open source serialisation format used in GRPC. Also because Cap'n Proto claims to have no serialisation/deserialisation at all once a message is created, which means it should be very fast for transferring data around a distributed system. This includes consideration for saving that same data structure that doesn't need serialisation. It's apparently all handled by the protocol definition, all the way down to the endian consideration for saved data.
The Cap'n Proto site is both funny and full of information as to why it was created - but I can't immediately find a way to get an example up and running or to understand how to convert my app's types to Cap'n Proto types. I think everything I need should be there as I can see there is a section on Encoding which should explain this.
The only hurdle I have is that while the documentation is extensive it is a little confusing in places and mainly focuses on C++ and the C++ RPC system which is a little different to the Rust code. There are Rust examples in the official repo which I will try and leverage here.
Installing Cap'n Proto
There is a note on their site that Homebrew can be used to install on a Mac. But at the time of writing I couldn't figure out what to install.
After some hunting I found that we need the relevant tool to process the Cap'n Proto (capnp) Schema files: https://capnproto.org/capnp-tool.html
I found this can be installed on a Mac with:
brew install capnp
If you don't have Homebrew for you Mac, go here: https://brew.sh/
If you don't have a Mac, there are installation instructions here: https://capnproto.org/install.html
once installed we can make sure it runs and look at the help:
capnp --help
Output
Usage: capnp [<option>...] <command> [<arg>...]
Command-line tool for Cap'n Proto development and debugging.
Commands:
compile Generate source code from schema files.
convert Convert messages between binary, text, JSON, etc.
decode DEPRECATED (use `convert`)
encode DEPRECATED (use `convert`)
eval Evaluate a const from a schema file.
id Generate a new unique ID.
See 'capnp help <command>' for more information on a specific command.
Options:
-I<dir>, --import-path=<dir>
Add <dir> to the list of directories searched for non-relative imports
(ones that start with a '/').
--no-standard-import
Do not add any default import paths; use only those specified by -I.
Otherwise, typically /usr/include and /usr/local/include are added by
default.
--verbose
Log informational messages to stderr; useful for debugging.
--version
Print version information and exit.
--help
Display this help text and exit.
Ok it's working. What do I do now?
I guess we can start with a message example.
The docs say:
Cap’n Proto messages are strongly-typed and not self-describing. You must define your message structure in a special language, then invoke the Cap’n Proto compiler.
Ok let's have a look at the compiler tool docs.
It says I can do this:
capnp compile -oc++ myschema.capnp
This is fine but I want Rust, not C++ code which this command seems to generate. Looking around, there is a bunch Rust crates that I think will help, plus an examples folder, all in this repo:
https://github.com/capnproto/capnproto-rust
But the example contains an ID in the schema file, so I'm not sure if I need to generate this or it is generated by the tool and... inserted into the schema?
Some more hunting around and text searching for "generate" brought me to the language page where I found this:
So it looks like I need to generate at least 1 id and put it in my schema.
❯ capnp id
@0xb068ff5fb1c4f77e;
And let's use the example from the capnproto-rust repo, but with our ID:
I will call this file src/schema/point.capnp
@0xb068ff5fb1c4f77e;
struct Point {
x @0 :Float32;
y @1 :Float32;
}
interface PointTracker {
addPoint @0 (p :Point) -> (totalPoints :UInt64);
}
What does this describe? It looks like an RPC call to add a Point (with x & y coords defined as f32's) to something like a list of points, and it returns the totalPoints, which is a u64. As this type is not a collection I will assume it means the total-number-of-points.
Quick review of the schema basics:
- Capnp comments use a "#"
-
The capnp types are:
- Void: Void
- Boolean: Bool
- Integers: Int8, Int16, Int32, Int64
- Unsigned integers: UInt8, UInt16, UInt32, UInt64
- Floating-point: Float32, Float64
- Blobs: Text (UTF8 NUL terminated), Data
- Lists: List(T) - the T is a Capnp built-in or defined capnp Schema Struct
Struct fields are consecutively numbered (like protobuf) - but with an "@"
There are Enums but also Unions.
Interfaces wrap methods (the
PointTracker
interface above containsaddPoint
method)".capnp" files can import other ".capnp" files
Types for a field are declared with a :colon
The plan
As a rough plan, I want to be able to serve this interface and use or save the file in some way as a demo of the capnp capabilities. The challenge will be to make it as simple as possible so it facilitates what is an exploratory reference (for me at least) and hopefully some info/learning for anyone else looking at this protocol or learning/exploring Rust.
I've now made a cargo new
project folder and added a src/schema folder for the file above.
In case generating a capnp ID sounds like a pain - the vscode-capnp extension for vs-code can generate a capnp ID anytime you need it.
(In fact I accidentally found out later that if you forget, the compiler throws an error and generates the ID for you so you can just copy and paste it in)
Generating a Cap'n Proto Schema
Let's see what the cli tool says about compiling now:
❯ capnp help compile
Usage: capnp compile [<option>...] <source>...
Compiles Cap'n Proto schema files and generates corresponding source code in one
or more languages.
Options:
-I<dir>, --import-path=<dir>
Add <dir> to the list of directories searched for non-relative imports
(ones that start with a '/').
--no-standard-import
Do not add any default import paths; use only those specified by -I.
Otherwise, typically /usr/include and /usr/local/include are added by
default.
-o<lang>[:<dir>], --output=<lang>[:<dir>]
Generate source code for language <lang> in directory <dir> (default:
current directory). <lang> actually specifies a plugin to use. If
<lang> is a simple word, the compiler searches for a plugin called
'capnpc-<lang>' in $PATH. If <lang> is a file path containing slashes,
it is interpreted as the exact plugin executable file name, and $PATH is
not searched. If <lang> is '-', the compiler dumps the request to
standard output.
--src-prefix=<prefix>
If a file specified for compilation starts with <prefix>, remove the
prefix for the purpose of deciding the names of output files. For
example, the following command:
capnp compile --src-prefix=foo/bar -oc++:corge foo/bar/baz/qux.capnp
would generate the files corge/baz/qux.capnp.{h,c++}.
--verbose
Log informational messages to stderr; useful for debugging.
--version
Print version information and exit.
--help
Display this help text and exit.
Aha:
the compiler searches for a plugin called 'capnpc-' in $PATH...
Not sure if I have that. Let's see what the autocomplete finds:
❯ capnpc
capnpc capnpc-c++ capnpc-capnp
Nope. Ok let's install capnpc-rust:
?
I couldn't find anything about needing to install this. Maybe it's magical and I can just select Rust as the language:
❯ capnp compile -orust src/schema/point-schema.capnp
rust: no such plugin (executable should be 'capnpc-rust')
rust: plugin failed: exit code 1
Yup, it's not magical.
Hmm... maybe it's a Cargo crate?
❯ cargo install capnpnc-rust
Updating crates.io index
error: could not find `capnpnc-rust` in registry `crates-io` with version `*`
Nope.
Ok maybe I'm going about this the wrong way. I guess I could compile the capnpc-rust to a binary by cloning the repo but that may be unnecessary as what I really want is to compile it from within my own code. Isn't it? 🤷 - This is just a guess from reading the capnproto-rust repo:
It's also strongly hinted at in the capnproto-rust docs:
We can try...
crate::Cargo.toml
:
[package]
name = "capnproto-demo"
version = "0.1.0"
edition = "2021"
build = "build.rs"
[dependencies]
[build-dependencies]
capnpc = "0.14"
crate::build.rs:
fn main() {
capnpc::CompilerCommand::new()
.src_prefix("src/schema")
.file("src/schema/point.capnp")
.run()
.expect("schema compiler command failed");
}
And it compiles and runs the build cargo build
! But it doesn't do anything. 😞 Or maybe it did and there's a schema somewhere on my drive?
It's probably this missing Env-var from the examples:
...but I think I want to specify the output folder myself:
fn main() {
capnpc::CompilerCommand::new()
.src_prefix("src/schema")
.file("src/schema/point.capnp")
.output_path("src/schema")
.run()
.expect("schema compiler command failed");
}
Ok! Now we have a generated schema file that is around 500 lines of code:
I'm going to cargo build again to see what happens when the schema already exists:
❯ ll src/schema
total 56
-rw-r--r-- 1 kushaljoshi staff 159B 30 Apr 15:58 point.capnp
-rw-r--r-- 1 kushaljoshi staff 20K 30 Apr 17:54 point_capnp.rs
❯ cargo build
Finished dev [unoptimized + debuginfo] target(s) in 0.04s
❯ ll src/schema
total 56
-rw-r--r-- 1 kushaljoshi staff 159B 30 Apr 15:58 point.capnp
-rw-r--r-- 1 kushaljoshi staff 20K 30 Apr 17:54 point_capnp.rs
Nothing (I ran the second cargo build
at 18:00)! this looks good so far. I don't want to be pointlessly regenerating the schema on every build.
Right, now we have a schema and automatically generated code in our build. That's quite nice. Now how do we use it?
Using the generated code
In the generated code there's pub mod point
module wrapper so this seems like a good places to start. Let's use
that module in our project:
We'll keep it nice and simple. First we can make a server module that will be the capnp server.
Cargo.toml
:
...
[dependencies]
capnp = "0.14"
...
main.rs:
mod server;
fn main() {
println!("Hello, world!");
}
I've left the default new project code for now as a sign-post for very new people to see what is happening and how we are building up the project.
server.rs:
#[path = "./schema/point_capnp.rs"]
mod point_capnp;
use point_capnp::{point, point_tracker};
I'm guessing we need to tell the compiler where the code is.
There's a small issue when we try to build this. The generated code expects the point_capnp mod to be at the top level and doesn't like it being declared inside server::
:
That's a little annoying. The generated code is hard coded to crate::point_np
.
I had a read of the issues for a few hours and found this has been addressed, albeit in what feels like a hacky way, and was raised/found as an issue in an old blog article from Hoverbear, which helped immensely here (thanks Ana!).
The simple answer for us right now (if there is a better/simpler solution, please comment) is to add this file - rust.capnp to the schema folder and include it in each schema like this:
point.capnp:
@0xb068ff5fb1c4f77e;
using Rust = import "rust.capnp";
$Rust.parentModule("server");
struct Point {
x @0 :Float32;
y @1 :Float32;
}
interface PointTracker {
addPoint @0 (p :Point) -> (totalPoints :UInt64);
}
This is an irritant as it's a manual change to every schema file but it works great and compiled fine with tons of "associated function is not used" warnings for the generated code. Adding #![allow(dead_code)]
at the top of the server.rs file fixed this for now. This is a pattern that works for now but probably won't scale - I'll let my server module "own" the capnp generate code for each schema that server is a host for.
I'm making a first commit to the repo at this point as I have a compiling capnp schema 🎉.
Getting to the Point
At this stage we are almost at the end of most of the available documentation regarding Rust but the capnproto-rust repo contains both serialisation and RPC examples. Deconstructing those, I'm hoping to make the simplest implementation I can here.
Let's make a point from our Point. The docs say:
In Rust, the generated code for the example above includes a point::Reader<'a> struct with get_x() and get_y() methods, and a point::Builder<'a> struct with set_x() and set_y() methods.
To understand how to use these, we have to jump back to the beginning of the documentation to understand how capnp works:
Cap’n Proto generates classes with accessor methods that you use to traverse the message.
Ok so we need to make a message that will contain our Point. I think.
In the address book example capnp::serialized_packed
is used to read and write this message to a stream. Docs for this are here.
We can copy this address book code structure to make our Point.
server.rs:
#![allow(dead_code)]
#[path = "./schema/point_capnp.rs"]
mod point_capnp;
pub mod point_demo {
use crate::server::point_capnp::point;
use capnp::serialize_packed;
pub fn write_to_stream() -> ::capnp::Result<()> {
let mut message = ::capnp::message::Builder::new_default();
let mut demo_point = message.init_root::<point::Builder>();
demo_point.set_x(5_f32);
demo_point.set_y(10_f32);
serialize_packed::write_message(&mut ::std::io::stdout(), &message)
}
}
main.rs:
mod server;
fn main() {
let _ = server::point_demo::write_to_stream();
}
Output:
❯ cargo run
Compiling capnproto-demo v0.1.0 (/Users/kushaljoshi/code/rust/capnproto/capnproto-demo)
Finished dev [unoptimized + debuginfo] target(s) in 1.13s
Running `target/debug/capnproto-demo`
̠@ A%
Fantastic! We "serialized" our Point and packed it into a capnp message. The message is not readable (it's the underscore, at-symbol, space, capital-A, percent-symbol) because it is capnp's binary type that does not need further serialization/deserialization over a stream to be used with an application. Can we check it?
Yes! The capnp
tool provides a decode feature that needs the schema and the data structure:
❯ cargo run | capnp decode ./src/schema/point.capnp Point
Finished dev [unoptimized + debuginfo] target(s) in 0.04s
Running `target/debug/capnproto-demo`
capnp decode: The input is not in "binary" format. It looks like it is in "packed" format. Try that instead.
Try 'capnp decode --help' for more information.
Ok so this didn't work because we need to either tell capnp that it's a packed (compressed) message, or we need to print the raw message to STDOUT. Let's do both to increase our intuition of what is happening here. First we just need to add --packed to the CLI command:
❯ cargo run | capnp decode ./src/schema/point.capnp Point --packed
Finished dev [unoptimized + debuginfo] target(s) in 0.04s
Running `target/debug/capnproto-demo`
(x = 5, y = 10)
Now we can see that capnp can unpack (decompress) the message and print out the Point coords that we set. But we may not always have packed data so let's send the Point in its raw message format and make sure we can decode it as we would expect. We need to make a change to server for that:
server.rs
...
pub mod point_demo {
use crate::server::point_capnp::point;
use capnp::serialize;
pub fn write_to_stream() -> ::capnp::Result<()> {
let mut message = ::capnp::message::Builder::new_default();
let mut demo_point = message.init_root::<point::Builder>();
demo_point.set_x(5_f32);
demo_point.set_y(10_f32);
serialize::write_message(&mut ::std::io::stdout(), &message)
}
}
Output:
❯ cargo run | capnp decode ./src/schema/point.capnp Point --packed
Compiling capnproto-demo v0.1.0 (/Users/kushaljoshi/code/rust/capnproto/capnproto-demo)
Finished dev [unoptimized + debuginfo] target(s) in 0.65s
Running `target/debug/capnproto-demo`
capnp decode: The input is not in "packed" format. It looks like it is in "binary" format. Try that instead.
Try 'capnp decode --help' for more information.
A very helpful message that confirms what we know we did. We can remove the --packed
flag now.
Output:
❯ cargo run | capnp decode ./src/schema/point.capnp Point
Finished dev [unoptimized + debuginfo] target(s) in 0.05s
Running `target/debug/capnproto-demo`
(x = 5, y = 10)
Fabulous.
If you have followed along and got this working, you may want to see the benefit more clearly so for that we can save the data and load it back in without any further serialization/deserialization.
server.rs:
pub mod point_demo {
use crate::server::point_capnp::point;
use capnp::serialize;
use std::fs::File;
pub fn write_to_stream() -> std::io::Result<()> {
let mut message = ::capnp::message::Builder::new_default();
let mut demo_point = message.init_root::<point::Builder>();
demo_point.set_x(5_f32);
demo_point.set_y(10_f32);
// This Result should be consumed properly in an actual app
let _ = serialize::write_message(&mut ::std::io::stdout(), &message);
// Save the point
{
let file = File::create("point.txt")?;
let _ = serialize::write_message(file, &message);
}
// Read the point from file
{
let point_file = File::open("point.txt")?;
// We want this to panic in our demo incase there is an issue
let point_reader =
serialize::read_message(point_file, ::capnp::message::ReaderOptions::new())
.unwrap();
let demo_point: point::Reader = point_reader.get_root().unwrap();
println!("\n(x = {}, y = {})", demo_point.get_x(), demo_point.get_y());
}
Ok(())
}
}
Output:
❯ cargo run
Compiling capnproto-demo v0.1.0 (/Users/kushaljoshi/code/rust/capnproto/capnproto-demo)
Finished dev [unoptimized + debuginfo] target(s) in 0.63s
Running `target/debug/capnproto-demo`
@ A
(x = 5, y = 10)
So... it doesn't look like much happened there and the output, by design, looks the same.
However, you may have missed what just happened and how awesome this 😄 !!
Let's go through it:
- We created a serialized Point from our Point Schema
- We set data inside the serialized Point (no need to deserialize Point or serialize x & y float 32 values)
- We saved the serialized data to disk using the standard file tools
- We read in the serialized data using the standard file tooling (endianness is considered in the filetype)
- We used accessor methods on the serialized data and printed the value without deserializing the data.
It's ok - if you are thinking "so what?", then your project use cases may not have been performance critical so far. If they have though, then this should be a wondrous thing to behold!
For the uber skeptical: The Reader
above is not a deserializer. It is literally a Reader. It needs a schema and some data and it know how to set the pointers in the data (which is made up of ordered segments) to make the accessor methods point at the correct parts of the data. For more information have a read of the capnp encoding page.
You can decode the file data the same way as the STDOUT output above:
cat point.txt | capnp decode ./src/schema/point.capnp Point
(x = 5, y = 10)
Now this really quite interesting; if it can be saved, it can be thrown over a network and used by any client that has the appropriate map (schema) to read the received data, without any interim deserialization steps.
That's what we will try next in Part 2.
Posted on May 15, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.