Example of Rust attribute macros: data serialization (part 1 - structures)

programcrafter

ProgramCrafter

Posted on February 24, 2024

Example of Rust attribute macros: data serialization (part 1 - structures)

Recently, I decided to create Rust library that allows to serialize objects into "cells" (generally, bytes) for usage in TON blockchain. There are two obvious ways to do that:

  1. For each structure, write out buffer.store_value(self.a); buffer.store_value(self.b); .... For parsing that, repeat the work once again.

    Example
    impl Serializable for InternalMessageHeader {
      fn write_to(&self, cell: &mut BuilderData) -> Result<()> {
        cell
            .append_bit_zero()?              //tag
            .append_bit_bool(self.ihr_disabled)?
            .append_bit_bool(self.bounce)?
            .append_bit_bool(self.bounced)?;
    
        self.src.write_to(cell)?;
        self.dst.write_to(cell)?;
    
        self.value.write_to(cell)?;         //value: CurrencyCollection
    
        self.ihr_fee.write_to(cell)?;       //ihr_fee
        self.fwd_fee.write_to(cell)?;       //fwd_fee
    
        self.created_lt.write_to(cell)?;    //created_lt
        self.created_at.write_to(cell)?;    //created_at
    
        Ok(())
      }
    }
    

    (quoted from ton-labs-block / messages.rs). In particular, notice ? each time. It's most definitely not normal if data serialization can fail in the middle.

  2. Create macros that will take order in which to store fields and create serialization code automatically. When deserialization is added, it will certainly match the order of writes so fields don't mix up.

  3. Use Serde and just write custom serializer. Unfortunately, it doesn't allow to specify order of fields serialization, nor compile-time checks that serialization could be fallible or infallible.

Rust has two kinds of macros: by-example (defined with macro_rules!) and procedural (they can be attached as attributes to structs, enums, etc, just like #[derive(...)]). I've thought that procedural macros will look cleaner in place of usage.

An attribute macros accepts two sequences of tokens: whatever is within its invocation (for Derive, that would be list of traits), and what it is applied to.

Required libraries

We need three external modules: quote+proc-macro2 to avoid forming token sequences by hand and instead having ability to substitute variables into the template, plus syn to parse code we receive (for instance, to iterate over enum variants).

[package]
name = "tlb_macro"
version = "0.1.0"
edition = "2021"

[lib]
proc-macro = true

# See more keys and their definitions at
#     https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
syn = {version = "^2.0.50", features = ["full"]}
quote = "^1.0.8"
proc-macro2 = "^1.0.78"
Enter fullscreen mode Exit fullscreen mode
extern crate proc_macro;
use syn::{parse_macro_input, DeriveInput, Data, Expr,
    Fields, Ident, ItemEnum, Meta, MetaList, Lit, spanned::Spanned};
use quote::{quote_spanned, quote, ToTokens};
use proc_macro2::Span;

use std::collections::HashMap;

type OldTokenStream = proc_macro::TokenStream;
type V2TokenStream = proc_macro2::TokenStream;
Enter fullscreen mode Exit fullscreen mode

Entry point

Let's call our attribute tlb_serializable (after TL-B language), and decide it should be used as follows:

#[tlb_serializable(u 4 3bit, workchain, hash_high, hash_low)]
pub struct Address {
    workchain: u8,
    hash_high: u128,
    hash_low: u128
}
Enter fullscreen mode Exit fullscreen mode

Then, main function will look so:

#[proc_macro_attribute]
pub fn tlb_serializable(attr: OldTokenStream, mut item: OldTokenStream) -> OldTokenStream {
    let struct_item = item.clone();
    let input: ItemStruct = parse_macro_input!(struct_item);
    let name = input.ident;

    let serializers = create_serialization_code(
        &attr.to_string(), &input.fields);
    item.extend(OldTokenStream::from(quote! {
        impl crate::ton::CellSerialize for #name {
            fn serialize(&self) -> ::std::vec::Vec<::std::string::String> {
                let mut result : ::std::vec::Vec<::std::string::String> = ::std::vec![];
                #serializers
                result
            }
        }
    }));

    item
}
Enter fullscreen mode Exit fullscreen mode

We parse the incoming thing as struct, panicking if it is not (this would result in compilation error, noting what went wrong). Then we take its name to substitute into impl template (what is serialization trait implemented on), and pass fields into next function. With the resulting sequence of store-statements, we extend struct definition (attribute macros replaces code it is applied to with whatever it returns).

The code generator itself

fn create_serialization_code(attr: &str, struct_fields: &Fields)
        -> V2TokenStream {
    let Fields::Named(ref fields) = struct_fields else {
        panic!("For unambiguous parsing, normal structs must consist of named fields");
    };
    let mut field_spans: HashMap<String, (Ident, Span)> = HashMap::new();
    for field in fields.named.iter() {
        let id = field.ident.clone().expect("unnamed field");
        field_spans.insert(id.to_string(), (id, field.span()));
    }

    // Mapping each part of serialization TL-B to block of code that stores value into cell
    let serializations = attr.split(",").map(|part_whitespaced| {
        let part = part_whitespaced.trim();
        if part.is_empty() {
            quote!{}
        } else if part.starts_with("u ") {
            quote! { result.push(#part.to_owned()); }
        } else {
            let (name, span) = &field_spans[part];
            quote_spanned! {span.clone()=>{
                let mut s_field = crate::ton::CellSerialize::serialize(&self.#name);
                result.append(&mut s_field);
            }}
        }
    });

    // Constructing function of all those code chunks
    quote!{{
        #(#serializations)*
    }}
}
Enter fullscreen mode Exit fullscreen mode

The code is quite straightforward! First, we take a list of fields in struct and store identifier + span for each of them. Identifier is just handy not to create extra time, while span allows to direct any errors onto the field definition instead of macros:

error[E0277]: the trait bound `i128: CellSerialize` is not satisfied
  --> src\main.rs:17:9
   |
17 |         hash_high: i128,
   |         ^^^^^^^^^ the trait `CellSerialize` is not implemented for `i128`
Enter fullscreen mode Exit fullscreen mode

If we replace quote_spanned! with quote!, we won't see even name of field that caused the error:

error[E0277]: the trait bound `i128: CellSerialize` is not satisfied
  --> src\main.rs:14:5
   |
14 |     #[tlb_serializable(u 4 3bit, workchain, hash_high, hash_low)]
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `CellSerialize` is not implemented for `i128`
Enter fullscreen mode Exit fullscreen mode

One more detail I feel I should mention, is why in some places there is quote!{ code } and somewhere quote!{{ code }}. The answer is simple: the second option creates a separate block of code that isolates any local variables which are there so they don't clash with ones defined in serialization of the next field.

A disputable question

Would I be better off using macros-by-example? On one hand, they can parse text describing order of fields easier. On the other hand, it would be hard to use field names as expressions (since those macros are hygienic). And finally, I wouldn't learn how procedural macros work and wouldn't write this article!

💖 💪 🙅 🚩
programcrafter
ProgramCrafter

Posted on February 24, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related