Protoc Plugins with Go

homayoonalimohammadi

Homayoon Alimohammadi

Posted on August 18, 2023

Protoc Plugins with Go

In this article we take a look at Protocol Buffers (protobufs) and explore the intricate workings of protoc plugins in the context of Go programming. With a focus on demystifying protoc plugins and their role in code generation, we walkthrough a humble overview of the inner workings of plugins with practical snippets. Throughout the article, we will introduce and take a look at key concepts such as the Protobuf Compiler Plugin API, Go’s plugin package, and the integration of custom plugins into the protoc compilation pipeline. So, if you’ve ever found yourself grappling with the complex situations while working with protocol buffers (as I did) or are simply interested in harnessing the power of custom protoc plugins in Go, join me on this journey as we try to scratch the surface of Protocol Buffers together.

Protoc Plugins With Go

Introduction

Protocol Buffers (protobuf) have become a popular choice for data serialization and communication between services in the world of modern software development. With their language-agnostic nature, efficiency, and ease of use, protobufs have emerged as a go-to solution for structuring data interchange formats.

While protobufs offer an extensive set of features to define and handle messages, there are times when developers encounter complex scenarios that require additional customization. One such scenario that I personally faced was when I tried to deal with logical and domian specific constraints on the data. Things that can not be forced or even mentioned in a proto file outside of raw comments or loose conventions.

By developing a custom protoc plugin, we can extend the capabilities of the protobuf compiler (protoc) to automatically generate code that makes working with proto messages more intuitive and efficient. We will discuss the various components involved in the creation of such a plugin, including the Protobuf Compiler Plugin API, Go’s plugin package, and the necessary steps to integrate the plugin into the protobuf compilation process.

DISCLAIMER: I tend to make lots of mistakes on a daily basis, so if you’re one of the many that encountered one, I’d be grateful if you would correct me in the comments.

Picturing the problem

In my humble opinion, type-safety, while sometimes forces you to respect certain boundaries, can be of great value especially if you’re trying to build scalable, maintainable, understandable and easy-to-develop software. Of course this can be the topic for another detailed (and maybe controversial) discussion some time later, but for now, let’s just temporarily imagine that this fact holds true and we are on the same page.

Consider the message below:

    syntax = "proto3";

    import "google/protobuf/struct.proto";

    message Game {
      google.protobuf.Struct data = 1;
    }
Enter fullscreen mode Exit fullscreen mode

The data field above is obviously not type-safe and might contain anything from a simple integer to multiple complex nested structures. Serialized into json, it might turn into something like this:

    {
      "data": {
        "title": "Elden Ring",
        "description": "Third-person, action, RPG with a focus on combat and exploration",
        "developer": {
          "name": "FromSoftware Inc.",
        }
        "category": "role-playing",
        "number_of_bosses": 18,
      }
    }
Enter fullscreen mode Exit fullscreen mode

Let’s say we examined and analyzed every saved Game object in our database and gathered enough information to organize this data structure to something like this:

    message Game {
      string Title = 1;
      string Description = 2;
      message Developer {
        string Name = 1;
      }
      Developer developer = 3;
      enum Category {
        UNKNOWN = 0;
        ROLE_PLAYING = 1;
        SHOOTING = 2;
      }
      Category category = 4;

      message RolePlayingExtra {
        int32 number_of_bosses = 1;
      }
      message ShootingExtra {imagine
        int32 number_of_guns = 2;
      }

      oneof extra {
        RolePlayingExtra role_playing_extra = 5;
        ShootingExtra shooting_extra = 6;
      }
    }
Enter fullscreen mode Exit fullscreen mode

And as a domain specific logic, let’s say that we need to make sure for every category (e.g. ROLE_PLAYING ), only its respective extra should be filled (e.g. RolePlayingExtra). Notice that the extra message will not be of a concrete type (at least in Go) since it can contain fields of different types (e.g. RolePlayingExtra and ShootingExtra). Let’s take a look at it’s translation in Go code:

    type Game struct {
     ...
     // Types that are assignable to Extra:
     //
     // *Game_RolePlayingExtra_
     // *Game_ShootingExtra_
     Extra isGame_Extra `protobuf_oneof:"extra"`
    }

    type isGame_Extra interface {
     isGame_Extra()
    }

    type Game_RolePlayingExtra_ struct {
     RolePlayingExtra *Game_RolePlayingExtra `protobuf:"bytes,5,opt,name=role_playing_extra,json=rolePlayingExtra,proto3,oneof"`
    }

    type Game_ShootingExtra_ struct {
     ShootingExtra *Game_ShootingExtra `protobuf:"bytes,6,opt,name=shooting_extra,json=shootingExtra,proto3,oneof"`
    }

    func (*Game_RolePlayingExtra_) isGame_Extra() {}

    func (*Game_ShootingExtra_) isGame_Extra() {}
Enter fullscreen mode Exit fullscreen mode

The Extra field is of type isGame_Extra which is an interface implemented by Game_RolePlyaingExtra_ and Game_ShootingExtra_ structs and should be casted to the desired concrete type upon request. protoc-gen-go actually does a great job of handling this type cast:

    func (x *Game) GetRolePlayingExtra() *Game_RolePlayingExtra {
     if x, ok := x.GetExtra().(*Game_RolePlayingExtra_); ok {
      return x.RolePlayingExtra
     }
     return nil
    }

    func (x *Game) GetShootingExtra() *Game_ShootingExtra {
     if x, ok := x.GetExtra().(*Game_ShootingExtra_); ok {
      return x.ShootingExtra
     }
     return nil
    }
Enter fullscreen mode Exit fullscreen mode

Yet, we have two problems here:

  1. For any given Game struct, in order to set one of the extra fields (e.g. NumberOfGuns) we have to take care of the type casting ourselves which is not a big deal but can become cumbersome and will increase the clutter in our code.

  2. Protobuf on its own will not at all guarantee that Category and Extra fields are valued accordingly and with their relation in mind. This means that we can technically have a game with the Category being SHOOTING and the Extra being RolePlayingExtra. If not handled properly, this can cause inconsistencies and incorrectness in our data.

Considering the problems above, let’s see how a custom protoc plugin can help us overcome them.

Protoc Plugins

Let’s take a look at how protoc-gen-go generates a .pb.go file. First step is to use the plugin in as an argument to the protoc (protobuf compiler):

    protoc --go_out=. --go_opt=paths=source_relative game.proto
Enter fullscreen mode Exit fullscreen mode

So the --go_out=. argument will tell the protoc that the protoc-gen-go plugin should be invoked. Notice that --go_opt=... just passes some optional arguments to the protoc-gen-go which might affect the generation behavior. In order to have any kind of protoc plugins (both official plugins such as protoc-gen-go or any private or custom plugin) it needs to meet 2 requirements:

  • The executable file should be placed somewhere in your $PATH.

  • It should be named by the pattern protoc-gen-.

  • So it can be invokes like so:

    protoc --<MY_PLUGIN>_out=. --<MY_PLUGIN>_opt=... some.proto
Enter fullscreen mode Exit fullscreen mode

Now let’s take a look at the source code of the protoc-gen-go plugin:

    package main 

    import (
      "github.com/golang/protobuf/internal/gengogrpc"
      "google.golang.org/protobuf/compiler/protogen"
    )

    func main() {
     ...
     protogen.Options{...}.Run(func(gen *protogen.Plugin) error {
      ...
      for _, f := range gen.Files {
       ...
       g := gengo.GenerateFile(gen, f)
       ...
      }
      ...
      return nil
     })
    }
Enter fullscreen mode Exit fullscreen mode

Lines populated with the ellipsis indicate additional information that are not really important for us at the moment. Let’s take a look at what we have:

  • protogen.Options{}: a struct that provides configuration options for the protoc plugin. It allows you to customize the behavior of the plugin and specify various settings related to code generation. Here is a sample snippet if you want to use flags as your options:
    var (
      flags flag.FlagSet
      myOpt = flags.String("myopt", "", "my random option")
     )
     protogen.Options{ParamFunc: flags.Set}.Run(func(gen *protogen.Plugin) error {...})
Enter fullscreen mode Exit fullscreen mode

These flags can be passed into our plugin like so:

    protoc --<MY_PLUGIN>_out=. --<MY_PLUGIN>_opt=myopt=something some.proto
Enter fullscreen mode Exit fullscreen mode
  • Run(func(gen *protogen.Plugin) error): executes a function as a protoc plugin. It reads a CodeGeneratorRequest message from os.Stdin, invokes the plugin function, and writes a CodeGeneratorResponse message to os.Stdout. If a failure occurs while reading or writing, Run prints an error to os.Stderr and calls os.Exit(1).

  • Lastly we iterate over the files specified in the command line and pass them to gogen.GenerateFile which we will take care of the rest of the process (i.e. creating the actual output file and putting the autogenerated content into it).

Let’s take a look at the gogen.GenerateFile function:

    // GenerateFile generates the contents of a .pb.go file.
    func GenerateFile(gen *protogen.Plugin, file *protogen.File) *protogen.GeneratedFile {
     filename := file.GeneratedFilenamePrefix + ".pb.go"
     g := gen.NewGeneratedFile(filename, file.GoImportPath)

     ...

     g.P(packageDoc, "package ", f.GoPackageName)
     g.P()

     ...

     genImport(gen, g, f, imps.Get(i))
     genEnum(g, f, enum)
     genMessage(g, f, message)
     genExtensions(g, f)
     genReflectFileDescriptor(gen, g, f)

     ...

     return g
    }
Enter fullscreen mode Exit fullscreen mode

We can create as many new files as we want using the command gen.NewGeneratedFile by passing our desired fileName and the GoImportPath defined in the proto file. This will provide us with an instance of *protogen.GeneratedFile with its super handy function P(). Let’s take a closer look at it:

    // P prints a line to the generated output. It converts each parameter to a
    // string following the same rules as fmt.Print. It never inserts spaces
    // between parameters.
    func (g *GeneratedFile) P(v ...interface{}) {
     for _, x := range v {
      switch x := x.(type) {
      case GoIdent:
       fmt.Fprint(&g.buf, g.QualifiedGoIdent(x))
      default:
       fmt.Fprint(&g.buf, x)
      }
     }
     fmt.Fprintln(&g.buf)
    }
Enter fullscreen mode Exit fullscreen mode

To put it simply, P() iterates over the inputs and tries to export them into the generated file with regards to proper Go-specific indentation, so we can easily write any legit go statement or declaration that we want, no matter the indentation or leading tabs/spaces.

Other lines presented in the GenerateFile() function are just taking care of different components of our proto file (e.g. messages, enums, etc…)

Now that we have a fundamental understanding of how protoc plugins work, it’s time to see how we can leverage these tools to write our own custom plugin.

Protoc Custom Plugin

You can find the complete code in my github repo: https://github.com/HomayoonAlimohammadi/protoc-gen-gamedata

Let’s make it as simple as possible, here’s a main.go to start with:

    package main

    import (
     "github.com/HomayoonAlimohammadi/protoc-gen-gamedata/gamedata"
     "google.golang.org/protobuf/compiler/protogen"
    )

    func main() {
     protogen.Options{}.Run(func(p *protogen.Plugin) error {
      return gamedata.Generate(p)
     })
    }
Enter fullscreen mode Exit fullscreen mode

It invokes the Generate function from gamedata package, which, as you can see, is really simple and minimal:

    func Generate(p *protogen.Plugin) error {
     g := p.NewGeneratedFile("autogen/gamedata.autogen.go", protogen.GoImportPath("gamedata"))
     g.P("// Code generated by gamedata. DO NOT EDIT")
     g.P("package gamedata")

     game, err := extract(p, g)
     if err != nil {
      return err
     }

     genHelpers(g, game)

     return nil
    }
Enter fullscreen mode Exit fullscreen mode

After creating a new file in the autogen relative directory, we just put a single comment, indicating that it was automatically generated along with the package name.

There is this extract function in the middle which we will discuss in a moment, but before that, the genHelpers is actually going to generate Go code in our gamtedata.autogen.go file. It is responsible for generating the functions that help us overcome our initial challenges (setting extra fields according to the category field with the auto type casting in mind).

As you might have inferred from the code above, the extract function tries to summarize and return the information about our Game message, like below:

    type Game struct {
     Fields     []Field
     CatToExtra map[string]Extra
    }

    type Extra struct {
     Name string
     Fields []Field
    }

    type Field struct {
     Name string
     Type string
    }

    func extract(p *protogen.Plugin, g *protogen.GeneratedFile) (*Game, error) {
     for _, f := range p.Files {
      for _, m := range f.Messages {
       if m.Desc.Name() == "Game" {
        return extractGameData(m, g)
       }
      }
     }

     return nil, errors.New("failed to find `Game` message")
    }
Enter fullscreen mode Exit fullscreen mode

After finding the Game message from all the available files (p.Files, here only one) and messages (f.Messages,again only one) it translates the components into a Game struct. Note that you can access FieldDescriptor, MessageDescriptor, EnumDescriptor etc… with the Desc() method on the respective *protogen.Field, *protogen.Message, etc...:

    func extractGameData(m *protogen.Message, g *protogen.GeneratedFile) (*Game, error) {
     game := &Game{CatToExtra: make(map[string]Extra)}

     prefillCategories(m, game)

     for _, f := range m.Fields {
      if f.Desc.ContainingOneof() != nil && f.Desc.ContainingOneof().Name() == "extra" {
       err := fillExtras(g, f, game)
       if err != nil {
        return nil, fmt.Errorf("failed to fill extra: %w", err)
       }
      }

      game.Fields = append(
       game.Fields,
       Field{
        Name: string(f.Desc.Name()),
        Type: fieldGoType(g, f),
       },
      )
     }

     return game, nil
    }
Enter fullscreen mode Exit fullscreen mode

Let’s break down this function so we can better understands how it works:

  • First we prefill the categories, this will come handy later.
    func prefillCategories(m *protogen.Message, game *Game) {
     for _, e := range m.Enums {
      if string(e.Desc.Name()) == "Category" {
       for _, v := range e.Values {
        if string(v.Desc.Name()) == "UNKNOWN" {
         continue
        }

        game.CatToExtra[string(v.Desc.Name())] = Extra{}
       }
      }
     }
    }
Enter fullscreen mode Exit fullscreen mode
  • We iterate over Fields of our Game message

  • If we encounter a field that is contained in the extra oneof, we try to find its respective category and fill in the CatToExtra:

    func fillExtras(g *protogen.GeneratedFile, f *protogen.Field, game *Game) error {
     if !strings.HasSuffix(strings.ToLower(string(f.Desc.Name())), "_extra") {
      return fmt.Errorf("extra message %s does not end with _extra", f.Desc.Name())
     }

     name := strings.TrimSuffix(strings.ToLower(string(f.Desc.Name())), "_extra")

     var fields []Field
     for _, innerf := range f.Message.Fields {
      fields = append(fields, Field{Name: string(innerf.Desc.Name()), Type: fieldGoType(g, innerf)})
     }

     _, ok := game.CatToExtra[strings.ToUpper(name)]
     if !ok {
      return fmt.Errorf("no category available for `%s`", strings.ToUpper(name))
     }

     game.CatToExtra[strings.ToUpper(name)] = Extra{
      Name:   string(f.Desc.Name()),
      Fields: fields,
     }

     return nil
    }
Enter fullscreen mode Exit fullscreen mode

As required by our domain logic, we also make sure that all extra fields do end with the _extra suffix and that there are no dangling extra messages available as well (dangling as if there is no Category related to that extra).

The only thing left is to implement a logic to create the helper functions we needed to maintain consistency and correctness in our data. Since it won’t really help with understanding protobuf any more, I’m not going to write the code for that here, but I’m sure you can easily figure out how it’s done.

HINT: It’s basically a bunch of iterations, if/else statements and g.P()

In order to see what the code above gives us, let’s quickly serialize it to json and look at the results:

    b, _ := json.MarshalIndent(game, "", "\t")
    fmt.Println(string(b))
Enter fullscreen mode Exit fullscreen mode
    {
     "Fields": [
      {
       "Name": "Title",
       "Type": "string"
      },
      {
       "Name": "Description",
       "Type": "string"
      },
      {
       "Name": "developer",
       "Type": "*game.Game_Developer"
      },
      {
       "Name": "category",
       "Type": "game.Game_Category"
      },
      {
       "Name": "role_playing_extra",
       "Type": "*game.Game_RolePlayingExtra"
      },
      {
       "Name": "shooting_extra",
       "Type": "*game.Game_ShootingExtra"
      }
     ],
     "CatToExtra": {
      "ROLE_PLAYING": {
       "Name": "role_playing_extra",
       "Fields": [
        {
         "Name": "number_of_bosses",
         "Type": "int32"
        }
       ]
      },
      "SHOOTING": {
       "Name": "shooting_extra",
       "Fields": [
        {
         "Name": "number_of_guns",
         "Type": "int32"
        }
       ]
      }
     }
    }
Enter fullscreen mode Exit fullscreen mode

Finally, we have to build our plugin and place it somewhere in our $PATH so that it can be invoked via protoc:

    go build -o $(GOPATH)/bin/protoc-gen-gamedata main.go
    protoc --gamedata_out=. game.proto
Enter fullscreen mode Exit fullscreen mode

Wrap it up

In conclusion, we have explored the fascinating world of protoc plugins in Go and demonstrated how to write a custom plugin to enhance code generation in Protocol Buffers. By unraveling the intricacies of protoc plugins and providing practical examples, hopefully you might have a good idea how these interesting plugins work and how you can write your own. Needless to say that your comments, suggestions and ideas as well as your overall impression is of great importance to me.

💖 💪 🙅 🚩
homayoonalimohammadi
Homayoon Alimohammadi

Posted on August 18, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Protoc Plugins with Go
go Protoc Plugins with Go

August 18, 2023