Example of Rust attribute macros: data serialization (part 1 - structures)
ProgramCrafter
Posted on February 24, 2024
Recently, I decided to create Rust library that allows to serialize objects into "cells" (generally, bytes) for usage in TON blockchain. There are two obvious ways to do that:
-
For each structure, write out
buffer.store_value(self.a); buffer.store_value(self.b); ...
. For parsing that, repeat the work once again.Example
impl Serializable for InternalMessageHeader { fn write_to(&self, cell: &mut BuilderData) -> Result<()> { cell .append_bit_zero()? //tag .append_bit_bool(self.ihr_disabled)? .append_bit_bool(self.bounce)? .append_bit_bool(self.bounced)?; self.src.write_to(cell)?; self.dst.write_to(cell)?; self.value.write_to(cell)?; //value: CurrencyCollection self.ihr_fee.write_to(cell)?; //ihr_fee self.fwd_fee.write_to(cell)?; //fwd_fee self.created_lt.write_to(cell)?; //created_lt self.created_at.write_to(cell)?; //created_at Ok(()) } }
(quoted from ton-labs-block / messages.rs). In particular, notice
?
each time. It's most definitely not normal if data serialization can fail in the middle.
Create macros that will take order in which to store fields and create serialization code automatically. When deserialization is added, it will certainly match the order of writes so fields don't mix up.
Use Serde and just write custom serializer. Unfortunately, it doesn't allow to specify order of fields serialization, nor compile-time checks that serialization could be fallible or infallible.
Rust has two kinds of macros: by-example (defined with macro_rules!
) and procedural (they can be attached as attributes to structs, enums, etc, just like #[derive(...)]
). I've thought that procedural macros will look cleaner in place of usage.
An attribute macros accepts two sequences of tokens: whatever is within its invocation (for Derive
, that would be list of traits), and what it is applied to.
Required libraries
We need three external modules: quote
+proc-macro2
to avoid forming token sequences by hand and instead having ability to substitute variables into the template, plus syn
to parse code we receive (for instance, to iterate over enum variants).
[package]
name = "tlb_macro"
version = "0.1.0"
edition = "2021"
[lib]
proc-macro = true
# See more keys and their definitions at
# https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
syn = {version = "^2.0.50", features = ["full"]}
quote = "^1.0.8"
proc-macro2 = "^1.0.78"
extern crate proc_macro;
use syn::{parse_macro_input, DeriveInput, Data, Expr,
Fields, Ident, ItemEnum, Meta, MetaList, Lit, spanned::Spanned};
use quote::{quote_spanned, quote, ToTokens};
use proc_macro2::Span;
use std::collections::HashMap;
type OldTokenStream = proc_macro::TokenStream;
type V2TokenStream = proc_macro2::TokenStream;
Entry point
Let's call our attribute tlb_serializable
(after TL-B language), and decide it should be used as follows:
#[tlb_serializable(u 4 3bit, workchain, hash_high, hash_low)]
pub struct Address {
workchain: u8,
hash_high: u128,
hash_low: u128
}
Then, main function will look so:
#[proc_macro_attribute]
pub fn tlb_serializable(attr: OldTokenStream, mut item: OldTokenStream) -> OldTokenStream {
let struct_item = item.clone();
let input: ItemStruct = parse_macro_input!(struct_item);
let name = input.ident;
let serializers = create_serialization_code(
&attr.to_string(), &input.fields);
item.extend(OldTokenStream::from(quote! {
impl crate::ton::CellSerialize for #name {
fn serialize(&self) -> ::std::vec::Vec<::std::string::String> {
let mut result : ::std::vec::Vec<::std::string::String> = ::std::vec![];
#serializers
result
}
}
}));
item
}
We parse the incoming thing as struct, panicking if it is not (this would result in compilation error, noting what went wrong). Then we take its name to substitute into impl
template (what is serialization trait implemented on), and pass fields
into next function. With the resulting sequence of store-statements, we extend struct definition (attribute macros replaces code it is applied to with whatever it returns).
The code generator itself
fn create_serialization_code(attr: &str, struct_fields: &Fields)
-> V2TokenStream {
let Fields::Named(ref fields) = struct_fields else {
panic!("For unambiguous parsing, normal structs must consist of named fields");
};
let mut field_spans: HashMap<String, (Ident, Span)> = HashMap::new();
for field in fields.named.iter() {
let id = field.ident.clone().expect("unnamed field");
field_spans.insert(id.to_string(), (id, field.span()));
}
// Mapping each part of serialization TL-B to block of code that stores value into cell
let serializations = attr.split(",").map(|part_whitespaced| {
let part = part_whitespaced.trim();
if part.is_empty() {
quote!{}
} else if part.starts_with("u ") {
quote! { result.push(#part.to_owned()); }
} else {
let (name, span) = &field_spans[part];
quote_spanned! {span.clone()=>{
let mut s_field = crate::ton::CellSerialize::serialize(&self.#name);
result.append(&mut s_field);
}}
}
});
// Constructing function of all those code chunks
quote!{{
#(#serializations)*
}}
}
The code is quite straightforward! First, we take a list of fields in struct and store identifier + span for each of them. Identifier is just handy not to create extra time, while span allows to direct any errors onto the field definition instead of macros:
error[E0277]: the trait bound `i128: CellSerialize` is not satisfied
--> src\main.rs:17:9
|
17 | hash_high: i128,
| ^^^^^^^^^ the trait `CellSerialize` is not implemented for `i128`
If we replace quote_spanned!
with quote!
, we won't see even name of field that caused the error:
error[E0277]: the trait bound `i128: CellSerialize` is not satisfied
--> src\main.rs:14:5
|
14 | #[tlb_serializable(u 4 3bit, workchain, hash_high, hash_low)]
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `CellSerialize` is not implemented for `i128`
One more detail I feel I should mention, is why in some places there is quote!{ code }
and somewhere quote!{{ code }}
. The answer is simple: the second option creates a separate block of code that isolates any local variables which are there so they don't clash with ones defined in serialization of the next field.
A disputable question
Would I be better off using macros-by-example? On one hand, they can parse text describing order of fields easier. On the other hand, it would be hard to use field names as expressions (since those macros are hygienic). And finally, I wouldn't learn how procedural macros work and wouldn't write this article!
Posted on February 24, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
February 24, 2024