Protocol buffer deep-dive

techschoolguru

TECH SCHOOL

Posted on February 24, 2020

Protocol buffer deep-dive

Welcome to the 2nd hands-on lecture of protocol buffer. In the previous lecture, we have learned some basic syntax and data types. Today we will dig deeper into it.

Here's the link to the full gRPC course playlist on Youtube
Github repository: pcbook-go and pcbook-java
Gitlab repository: pcbook-go and pcbook-java

Here are the things that we're going to do:

  • Define and use custom types in protocol-buffer message fields, such as enums or other messages.
  • Discuss when to use nested types and when not to.
  • Organise protobuf messages into multiple files, put them into a package, then import them into other places.
  • Explore some well-known types that were already defined by Google.
  • Learn about Repeated field, one-of field.
  • Use option to tell protoc to generate Go code with the package name that we want.

Multiple messages in 1 file

Let's start with the processor_message.proto file. We can define multiple messages in 1 file, so I will add a GPU message here. It makes sense because GPU is also a processor.

syntax = "proto3";

message CPU {
  string brand = 1;
  string name = 2;
  uint32 number_cores = 3;
  uint32 number_threads = 4;
  double min_ghz = 5;
  double max_ghz = 6;
}

message GPU {
  string brand = 1;
  string name = 2;
  double min_ghz = 3;
  double max_ghz = 4;
  // memory ?
}
Enter fullscreen mode Exit fullscreen mode

It has some similar fields as the CPU, such as brand, name, min and max frequency. Just one different thing is that it has its own memory.

Custom types: message and enum

Memory is a very popular term that can be used in other places, such as the RAM or storage (persistent drive). It has many different measurement units, such as kilobyte, megabyte, gigabyte, or terabyte. So I will define it as a custom type, in a separate memory_message.proto file, so that we can reuse it later.

pcbook
├── proto
│   ├── processor_message.proto
│   └── memory_message.proto
├── pb
│   └── processor_message.pb.go
├── main.go
└── Makefile
Enter fullscreen mode Exit fullscreen mode

First, we need to define the measurement units. To do that, we will use enum. Because this unit should only exist within the context of the memory, we should define it as a nested type inside the Memory message.

syntax = "proto3";

message Memory {
  enum Unit {
    UNKNOWN = 0;
    BIT = 1;
    BYTE = 2;
    KILOBYTE = 3;
    MEGABYTE = 4;
    GIGABYTE = 5;
    TERABYTE = 6;
  }

  uint64 value = 1;
  Unit unit = 2;
}
Enter fullscreen mode Exit fullscreen mode

The convention is, always use a special value to serve as default value of your enum and assign the tag 0 for it. Then we add other units, from BIT to TERABYTE.

The Memory message will have 2 fields: one for the value and the other for the unit.

Import proto files

Now let's go back to the processor_message.proto file. We have to import the memory_message.proto file in order to use the Memory type. And in the GPU message, we add a new memory field of type Memory.

syntax = "proto3";

import "memory_message.proto";

message CPU {
  string brand = 1;
  string name = 2;
  uint32 number_cores = 3;
  uint32 number_threads = 4;
  double min_ghz = 5;
  double max_ghz = 6;
}

message GPU {
  string brand = 1;
  string name = 2;
  double min_ghz = 3;
  double max_ghz = 4;
  Memory memory = 5;
}
Enter fullscreen mode Exit fullscreen mode

Now if we try to generate Go codes, there will be an error saying "inconsistent package names"

Inconsistent package name

Because we haven't specified the package name in the proto files, by default, protoc will use the file name as the Go package.

The reason protoc throws an error here is that, the 2 generated Go files will belong to 2 different packages, but in Go, we cannot put 2 files of different packages in the same folder, in this case, the pb folder.

Set package name

To fix it, we must tell protoc to put the generated codes in the same package by specifying it in our proto files with this command package techschool.pcbook.

The memory_message.proto file:

syntax = "proto3";

package techschool.pcbook;

message Memory {
  enum Unit {
    UNKNOWN = 0;
    BIT = 1;
    BYTE = 2;
    KILOBYTE = 3;
    MEGABYTE = 4;
    GIGABYTE = 5;
    TERABYTE = 6;
  }

  uint64 value = 1;
  Unit unit = 2;
}
Enter fullscreen mode Exit fullscreen mode

The processor_message.proto file:

syntax = "proto3";

package techschool.pcbook;

import "memory_message.proto";

message CPU {
  string brand = 1;
  string name = 2;
  uint32 number_cores = 3;
  uint32 number_threads = 4;
  double min_ghz = 5;
  double max_ghz = 6;
}

message GPU {
  string brand = 1;
  string name = 2;
  double min_ghz = 3;
  double max_ghz = 4;
  Memory memory = 5;
}
Enter fullscreen mode Exit fullscreen mode

Now if we run make gen again, it will work, and the 2 generated Go files will belongs to the same package techschool_pcbook. Protoc uses underscore here because we cannot have dot in the package name in Go.

File memory_message.pb.go:

package techschool_pcbook

import (
    fmt "fmt"
    proto "github.com/golang/protobuf/proto"
    math "math"
)

// Reference imports to suppress errors if they are not otherwise used.
var _ = proto.Marshal
var _ = fmt.Errorf
var _ = math.Inf

// This is a compile-time assertion to ensure that this generated file
// is compatible with the proto package it is being compiled against.
// A compilation error at this line likely means your copy of the
// proto package needs to be updated.
const _ = proto.ProtoPackageIsVersion3 // please upgrade the proto package

type Memory_Unit int32

const (
    Memory_UNKNOWN  Memory_Unit = 0
    Memory_BIT      Memory_Unit = 1
    Memory_BYTE     Memory_Unit = 2
    Memory_KILOBYTE Memory_Unit = 3
    Memory_MEGABYTE Memory_Unit = 4
    Memory_GIGABYTE Memory_Unit = 5
    Memory_TERABYTE Memory_Unit = 6
)
...
Enter fullscreen mode Exit fullscreen mode

File processor_message.pb.go:

package techschool_pcbook

import (
    fmt "fmt"
    proto "github.com/golang/protobuf/proto"
    math "math"
)

// Reference imports to suppress errors if they are not otherwise used.
var _ = proto.Marshal
var _ = fmt.Errorf
var _ = math.Inf

// This is a compile-time assertion to ensure that this generated file
// is compatible with the proto package it is being compiled against.
// A compilation error at this line likely means your copy of the
// proto package needs to be updated.
const _ = proto.ProtoPackageIsVersion3 // please upgrade the proto package

type CPU struct {
    Brand                string   `protobuf:"bytes,1,opt,name=brand,proto3" json:"brand,omitempty"`
    Name                 string   `protobuf:"bytes,2,opt,name=name,proto3" json:"name,omitempty"`
    NumberCores          uint32   `protobuf:"varint,3,opt,name=number_cores,json=numberCores,proto3" json:"number_cores,omitempty"`
    NumberThreads        uint32   `protobuf:"varint,4,opt,name=number_threads,json=numberThreads,proto3" json:"number_threads,omitempty"`
    MinGhz               float64  `protobuf:"fixed64,5,opt,name=min_ghz,json=minGhz,proto3" json:"min_ghz,omitempty"`
    MaxGhz               float64  `protobuf:"fixed64,6,opt,name=max_ghz,json=maxGhz,proto3" json:"max_ghz,omitempty"`
    XXX_NoUnkeyedLiteral struct{} `json:"-"`
    XXX_unrecognized     []byte   `json:"-"`
    XXX_sizecache        int32    `json:"-"`
}
...
Enter fullscreen mode Exit fullscreen mode

Update proto_path setting for vscode

There's one thing I want to show you here. Let's get back to our processor_message.proto file. Although we have successfully generated the Go codes, vscode still shows some red lines on the Memory and import command.

Import error

The problem is, by default, the vscode-proto3 extension uses our current working folder as the proto_path when it runs protoc for code analysis. So it cannot find the memory_message.proto file in pcbook folder to import.

If we change the path to proto/memory_message.proto then it won't complain anymore. However, I don't want to do that because later we will use these proto files in our Java project with a different directory structure.

So I'm gonna show you how to fix this by changing the proto_path settings of the vscode-proto3 extension. Let's open the extension tab and look for vscode-proto3.

vscode-proto3 settings

We copy these settings and paste them to the settings.json file of vscode.

{
    "workbench.colorTheme": "Material Theme Palenight",
    "workbench.iconTheme": "material-icon-theme",
    "editor.minimap.enabled": false,
    "editor.formatOnSave": true,
    "explorer.openEditors.visible": 0,
    "protoc": {
        "path": "/usr/local/bin/protoc",
        "options": [
            "--proto_path=proto"
        ]
    }
}
Enter fullscreen mode Exit fullscreen mode

We can get the protoc path by running: which protoc in the terminal. Normally it is /usr/local/bin/protoc. Then the --proto_path option should be set to proto. Now after we save the settings.json file and restart vscode, the error will be gone.

Install clang-format to automatic format code.

By the way, in the last lecture, we have installed the extension to call clang-format library. However, the code is not automatically formatted on save.

The reason is: we haven't installed the library yet. So let's install it with Homebrew.

brew install clang-format
Enter fullscreen mode Exit fullscreen mode

Then restart visual studio code. Now the code will be automatically formatted when we save the file.

Define Storage message

Let's continue with our project. I'm gonna create a new message for the storage in storage_message.proto file.

pcbook
├── proto
│   ├── processor_message.proto
│   ├── memory_message.proto
│   └── storage_message.proto
├── pb
│   ├── processor_message.pb.go
│   └── memory_message.pb.go
├── main.go
└── Makefile
Enter fullscreen mode Exit fullscreen mode

A storage could be a hard disk driver or a solid state driver. So we should define a Driver enum with these 2 values.

syntax = "proto3";

package techschool.pcbook;

import "memory_message.proto";

message Storage {
  enum Driver {
    UNKNOWN = 0;
    HDD = 1;
    SSD = 2;
  }

  Driver driver = 1;
  Memory memory = 2;
}
Enter fullscreen mode Exit fullscreen mode

Then add 2 fields to the storage message: the driver type, and the memory size.

Use option to generate custom package name for Go

The Go package name techschool_pcbook that protoc generates for us is a bit too long, and doesn't match with the name of the pb folder that contains the Go files.

So I want to tell it to use pb as the package name, but just for Go, because Java or other languages will use a different package naming convention.

We can do that by setting option go_package = "pb" in our proto files.

File storage_message.proto:

syntax = "proto3";

package techschool.pcbook;

option go_package = "pb";

import "memory_message.proto";

message Storage {
  enum Driver {
    UNKNOWN = 0;
    HDD = 1;
    SSD = 2;
  }

  Driver driver = 1;
  Memory memory = 2;
}
Enter fullscreen mode Exit fullscreen mode

File memory_message.proto:

syntax = "proto3";

package techschool.pcbook;

option go_package = "pb";

message Memory {
  enum Unit {
    UNKNOWN = 0;
    BIT = 1;
    BYTE = 2;
    KILOBYTE = 3;
    MEGABYTE = 4;
    GIGABYTE = 5;
    TERABYTE = 6;
  }

  uint64 value = 1;
  Unit unit = 2;
}

Enter fullscreen mode Exit fullscreen mode

File processor_message.proto:

syntax = "proto3";

package techschool.pcbook;

option go_package = "pb";

import "memory_message.proto";

message CPU {
  string brand = 1;
  string name = 2;
  uint32 number_cores = 3;
  uint32 number_threads = 4;
  double min_ghz = 5;
  double max_ghz = 6;
}

message GPU {
  string brand = 1;
  string name = 2;
  double min_ghz = 3;
  double max_ghz = 4;
  Memory memory = 5;
}
Enter fullscreen mode Exit fullscreen mode

Now if we run make gen to generate codes, all the generated Go files will use the same pb package.

pcbook
├── proto
│   ├── processor_message.proto
│   ├── memory_message.proto
│   └── storage_message.proto
├── pb
│   ├── processor_message.pb.go
│   ├── memory_message.pb.go
│   └── storage_message.pb.go
├── main.go
└── Makefile
Enter fullscreen mode Exit fullscreen mode

File storage_message.pb.go:

package pb

import (
    fmt "fmt"
    proto "github.com/golang/protobuf/proto"
    math "math"
)

// Reference imports to suppress errors if they are not otherwise used.
var _ = proto.Marshal
var _ = fmt.Errorf
var _ = math.Inf

// This is a compile-time assertion to ensure that this generated file
// is compatible with the proto package it is being compiled against.
// A compilation error at this line likely means your copy of the
// proto package needs to be updated.
const _ = proto.ProtoPackageIsVersion3 // please upgrade the proto package

type Storage_Driver int32

const (
    Storage_UNKNOWN Storage_Driver = 0
    Storage_HDD     Storage_Driver = 1
    Storage_SSD     Storage_Driver = 2
)

var Storage_Driver_name = map[int32]string{
    0: "UNKNOWN",
    1: "HDD",
    2: "SSD",
}

var Storage_Driver_value = map[string]int32{
    "UNKNOWN": 0,
    "HDD":     1,
    "SSD":     2,
}

func (x Storage_Driver) String() string {
    return proto.EnumName(Storage_Driver_name, int32(x))
}

func (Storage_Driver) EnumDescriptor() ([]byte, []int) {
    return fileDescriptor_170f09d838bd8a04, []int{0, 0}
}

type Storage struct {
    Driver               Storage_Driver `protobuf:"varint,1,opt,name=driver,proto3,enum=techschool.pcbook.Storage_Driver" json:"driver,omitempty"`
    Memory               *Memory        `protobuf:"bytes,2,opt,name=memory,proto3" json:"memory,omitempty"`
    XXX_NoUnkeyedLiteral struct{}       `json:"-"`
    XXX_unrecognized     []byte         `json:"-"`
    XXX_sizecache        int32          `json:"-"`
}
...
Enter fullscreen mode Exit fullscreen mode

File memory_message.pb.go:

package pb

import (
    fmt "fmt"
    proto "github.com/golang/protobuf/proto"
    math "math"
)

// Reference imports to suppress errors if they are not otherwise used.
var _ = proto.Marshal
var _ = fmt.Errorf
var _ = math.Inf

// This is a compile-time assertion to ensure that this generated file
// is compatible with the proto package it is being compiled against.
// A compilation error at this line likely means your copy of the
// proto package needs to be updated.
const _ = proto.ProtoPackageIsVersion3 // please upgrade the proto package

type Memory_Unit int32

const (
    Memory_UNKNOWN  Memory_Unit = 0
    Memory_BIT      Memory_Unit = 1
    Memory_BYTE     Memory_Unit = 2
    Memory_KILOBYTE Memory_Unit = 3
    Memory_MEGABYTE Memory_Unit = 4
    Memory_GIGABYTE Memory_Unit = 5
    Memory_TERABYTE Memory_Unit = 6
)

var Memory_Unit_name = map[int32]string{
    0: "UNKNOWN",
    1: "BIT",
    2: "BYTE",
    3: "KILOBYTE",
    4: "MEGABYTE",
    5: "GIGABYTE",
    6: "TERABYTE",
}

var Memory_Unit_value = map[string]int32{
    "UNKNOWN":  0,
    "BIT":      1,
    "BYTE":     2,
    "KILOBYTE": 3,
    "MEGABYTE": 4,
    "GIGABYTE": 5,
    "TERABYTE": 6,
}

func (x Memory_Unit) String() string {
    return proto.EnumName(Memory_Unit_name, int32(x))
}

func (Memory_Unit) EnumDescriptor() ([]byte, []int) {
    return fileDescriptor_c0c7f919ccc765da, []int{0, 0}
}

type Memory struct {
    Value                uint64      `protobuf:"varint,1,opt,name=value,proto3" json:"value,omitempty"`
    Unit                 Memory_Unit `protobuf:"varint,2,opt,name=unit,proto3,enum=techschool.pcbook.Memory_Unit" json:"unit,omitempty"`
    XXX_NoUnkeyedLiteral struct{}    `json:"-"`
    XXX_unrecognized     []byte      `json:"-"`
    XXX_sizecache        int32       `json:"-"`
}
...
Enter fullscreen mode Exit fullscreen mode

File processor_message.pb.go:

package pb

import (
    fmt "fmt"
    proto "github.com/golang/protobuf/proto"
    math "math"
)

// Reference imports to suppress errors if they are not otherwise used.
var _ = proto.Marshal
var _ = fmt.Errorf
var _ = math.Inf

// This is a compile-time assertion to ensure that this generated file
// is compatible with the proto package it is being compiled against.
// A compilation error at this line likely means your copy of the
// proto package needs to be updated.
const _ = proto.ProtoPackageIsVersion3 // please upgrade the proto package

type CPU struct {
    Brand                string   `protobuf:"bytes,1,opt,name=brand,proto3" json:"brand,omitempty"`
    Name                 string   `protobuf:"bytes,2,opt,name=name,proto3" json:"name,omitempty"`
    NumberCores          uint32   `protobuf:"varint,3,opt,name=number_cores,json=numberCores,proto3" json:"number_cores,omitempty"`
    NumberThreads        uint32   `protobuf:"varint,4,opt,name=number_threads,json=numberThreads,proto3" json:"number_threads,omitempty"`
    MinGhz               float64  `protobuf:"fixed64,5,opt,name=min_ghz,json=minGhz,proto3" json:"min_ghz,omitempty"`
    MaxGhz               float64  `protobuf:"fixed64,6,opt,name=max_ghz,json=maxGhz,proto3" json:"max_ghz,omitempty"`
    XXX_NoUnkeyedLiteral struct{} `json:"-"`
    XXX_unrecognized     []byte   `json:"-"`
    XXX_sizecache        int32    `json:"-"`
}
...
Enter fullscreen mode Exit fullscreen mode

Define Keyboard message

Next, we will define the keyboard message. It can has a QWERTY, QWERTZ, or AZERTY layout. For your information, QWERTZ is used widely in Germany. While in France, AZERTY is more popular.

syntax = "proto3";

package techschool.pcbook;

option go_package = "pb";

message Keyboard {
  enum Layout {
    UNKNOWN = 0;
    QWERTY = 1;
    QWERTZ = 2;
    AZERTY = 3;
  }

  Layout layout = 1;
  bool backlit = 2;
}
Enter fullscreen mode Exit fullscreen mode

The keyboard can be backlit or not, so we use a boolean field for it. Very simple, right?

Define Screen message

Now let's write a more complex message: the screen. It has a nested message type: Resolution. The reason we use nested type here is: resolution is an entity that has a close connection with the screen, it doesn’t have any meaning when standing alone.

syntax = "proto3";

package techschool.pcbook;

option go_package = "pb";

message Screen {
  message Resolution {
    uint32 width = 1;
    uint32 height = 2;
  }

  enum Panel {
    UNKNOWN = 0;
    IPS = 1;
    OLED = 2;
  }

  float size_inch = 1;
  Resolution resolution = 2;
  Panel panel = 3;
  bool multitouch = 4;
}
Enter fullscreen mode Exit fullscreen mode

Similarly, we have an enum for screen panel, which can be IPS or OLED. Then the screen size in inch. And finally a bool field to tell if it's a multitouch screen or not.

Define Laptop message

Alright, I think basically we've defined all necessary components of a laptop. So let's define the laptop message now.

syntax = "proto3";

package techschool.pcbook;

option go_package = "pb";

import "processor_message.proto";
import "memory_message.proto";

message Laptop {
    string id = 1;
    string brand = 2;
    string name = 3;
    CPU cpu = 4;
    Memory ram = 5;
}
Enter fullscreen mode Exit fullscreen mode

It has a unique identifier of type string. This ID will be automatically generated by the server. It has a brand and a name. Then a CPU and RAM. We need to import other proto files to use these types.

Repeated field

A laptop can have more than 1 GPU, so we use the repeated keyword to tell protoc that this is a list of GPUs.

Similarly, it's normal for a laptop to have multiple storages, so this field should be repeated as well.

syntax = "proto3";

package techschool.pcbook;

option go_package = "pb";

import "processor_message.proto";
import "memory_message.proto";
import "storage_message.proto";
import "screen_message.proto";
import "keyboard_message.proto";

message Laptop {
    string id = 1;
    string brand = 2;
    string name = 3;
    CPU cpu = 4;
    Memory ram = 5;
    repeated GPU gpus = 6;
    repeated Storage storages = 7;
    Screen screen = 8;
    Keyboard keyboard = 9;
}
Enter fullscreen mode Exit fullscreen mode

Then comes 2 normal fields: screen and keyboard. It's pretty straight-forward.

Oneof field

How about the weight of the laptop? Let's say, we allow it to be specified in either kilograms or pounds. In order to do that, we can use a new keyword: oneof.

syntax = "proto3";

package techschool.pcbook;

option go_package = "pb";

import "processor_message.proto";
import "memory_message.proto";
import "storage_message.proto";
import "screen_message.proto";
import "keyboard_message.proto";

message Laptop {
    string id = 1;
    string brand = 2;
    string name = 3;
    CPU cpu = 4;
    Memory ram = 5;
    repeated GPU gpus = 6;
    repeated Storage storages = 7;
    Screen screen = 8;
    Keyboard keyboard = 9;
    oneof weight {
        double weight_kg = 10;
        double weight_lb = 11;
    }
}
Enter fullscreen mode Exit fullscreen mode

In this block, we define 2 fields, one for kilograms and the other for pounds. Remember that when you use oneof fields group, only the field that get assigned last will keep its value.

Well-known types

Then we add 2 more fields: the price in USD and the release year of the laptop. And finally, we need a timestamp field to store the last update time of the record in our system.

Timestamp is one of the well-known types that have already been defined by Google, so we just need to import the package and use it.

syntax = "proto3";

package techschool.pcbook;

option go_package = "pb";

import "processor_message.proto";
import "memory_message.proto";
import "storage_message.proto";
import "screen_message.proto";
import "keyboard_message.proto";
import "google/protobuf/timestamp.proto";

message Laptop {
    string id = 1;
    string brand = 2;
    string name = 3;
    CPU cpu = 4;
    Memory ram = 5;
    repeated GPU gpus = 6;
    repeated Storage storages = 7;
    Screen screen = 8;
    Keyboard keyboard = 9;
    oneof weight {
        double weight_kg = 10;
        double weight_lb = 11;
    }
    double price_usd = 12;
    uint32 release_year = 13;
    google.protobuf.Timestamp updated_at = 14;
}
Enter fullscreen mode Exit fullscreen mode

There are many other well-known types. Please check out this link to learn more about them.

Now we can run make gen to generate Go codes for all of the messages.

pcbook
├── proto
│   ├── processor_message.proto
│   ├── memory_message.proto
│   ├── storage_message.proto
│   ├── keyboard_message.proto
│   ├── screen_message.proto
│   └── laptop_message.proto
├── pb
│   ├── processor_message.pb.go
│   ├── memory_message.pb.go
│   └── storage_message.pb.go
│   ├── keyboard_message.pb.go
│   ├── screen_message.pb.go
│   └── laptop_message.pb.go
├── main.go
└── Makefile
Enter fullscreen mode Exit fullscreen mode

Hooray! We've learned a lot about protocol buffer and how to generate Go codes from it. In the next hands-on lecture, I will show you how to setup a Gradle project to automatically generate Java codes from our proto files.

Thanks a lot for reading, and see you later!


If you like the article, please subscribe to our Youtube channel and follow us on Twitter for more tutorials in the future.


If you want to join me on my current amazing team at Voodoo, check out our job openings here. Remote or onsite in Paris/Amsterdam/London/Berlin/Barcelona with visa sponsorship.

💖 💪 🙅 🚩
techschoolguru
TECH SCHOOL

Posted on February 24, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Protocol buffer deep-dive
grpc Protocol buffer deep-dive

February 24, 2020