Is Protobuf.js Faster Than JSON?
Amir Blum
Posted on April 21, 2021
When you have structured data in JavaScript, which needs to be sent over the network (for another microservice, for example) or saved into a storage system, it first needs to be serialized.
The serialization process converts the data object you have in the JavaScript program memory into a buffer of bytes, which then can be deserialized back into a JavaScript object.
Two popular serialization methods are JSON and Google Protocol Buffers (Protobuf).
JSON
Serializing data to JSON is as easy as:
const data = { name: 'foo', age: 30 };
const serialized = JSON.stringify(data); // produce: '{"name":"foo","age":30}'
Protobuf.js
Google Protocol Buffers is a method of serializing structure data based on a scheme (written in .proto file).
Example of how to serialize the previous payload to Protobuf with the protobufjs package:
syntax = "proto3";
message Message {
string name = 1;
uint32 age = 2;
}
const protobuf = require("protobufjs");
protobuf.load("message.proto", (err, root) => {
if (err)
throw err;
const Message = root.lookupType("Message");
const data = { name: 'foo', age: 30 };
var errMsg = Message.verify(data);
if (errMsg)
throw Error(errMsg);
const serialized = Message.encode(data).finish(); // produce: <Buffer 0a 03 66 6f 6f 10 1e>
});
You can see that the generated output is only 7 bytes long, much less than the 23 bytes we got on JSON serialization.
Protobuf can serialize data so compactly mainly because it does not need to embed the field names as text in the data, possibly many times (“name” and “age” in this example are replaced by short descriptors of 2 bytes).
Picking the Right Format
Choosing the correct serialization format that works best for you is a task that involves multiple factors.
JSON is usually easier to debug (the serialized format is human-readable) and easier to work with (no need to define message types, compile them, install additional libraries, etc.).
Protobuf, on the other hand, usually compresses data better and has built-in protocol documentation via the schema.
Another major factor is the CPU performance — the time it takes for the library to serialize and deserializes a message. In this post, we want to compare just the performance in JavaScript.
You might eventually choose a format that is less performant but delivers value in other factors. But if performance might be a big issue for you, well, in that case, keep reading.
Encode Performance
At Aspecto, we wrote an SDK that collects trace events and exports them to an OpenTelemetry collector.
The data is formatted as JSON and sent over HTTP.
The exporter and collector can also communicate in protobuf using the protobufjs library.
Since the protobuf format is so compressed, we might think that encoding to protobuf requires less CPU (measured as the number of operations (encode/decode) in a second).
A quick Google search on the topic strengthens this thesis.
The Performance Section in protobufjs documentation led us to replace our SDK exporter from JSON to protobuf payload, thinking we will get better performance.
Actual Performance
After changing from JSON serialization to protobuf serialization, we ran our SDK benchmark.
To our surprise, the performance decreased.
That observation, which we first believed was a mistake, sent us to further investigate the issue.
Benchmarking — baseline
We first ran the original benchmark of protobufjs library to get a solid starting point. Indeed we got results similar to the library README:
benchmarking encoding performance ...
protobuf.js (reflect) x 724,119 ops/sec ±0.69% (89 runs sampled)
protobuf.js (static) x 755,818 ops/sec ±0.63% (90 runs sampled)
JSON (string) x 499,217 ops/sec ±4.02% (89 runs sampled)
JSON (buffer) x 394,685 ops/sec ±1.75% (88 runs sampled)
google-protobuf x 376,625 ops/sec ±1.05% (89 runs sampled)
protobuf.js (static) was fastest
protobuf.js (reflect) was 4.2% ops/sec slower (factor 1.0)
JSON (string) was 36.1% ops/sec slower (factor 1.6)
JSON (buffer) was 48.4% ops/sec slower (factor 1.9)
google-protobuf was 50.4% ops/sec slower (factor 2.0)
These results show that protobuf.js performance is better than JSON, as opposed to our previous observation.
Benchmark — telemetry data
We then modified the benchmark to encode our example data which is an opentelemetry trace data.
We copied the proto files and data to the benchmark and got the following results:
benchmarking encoding performance ...
protobuf.js (reflect) x 37,357 ops/sec ±0.83% (93 runs sampled)
JSON (string) x 52,952 ops/sec ±2.63% (89 runs sampled)
JSON (buffer) x 45,817 ops/sec ±1.80% (89 runs sampled)
JSON (string) was fastest
JSON (buffer) was 12.8% ops/sec slower (factor 1.1)
protobuf.js (reflect) was 28.2% ops/sec slower (factor 1.4)
These were the results we expected — for this data, protobuf was actually slower than JSON.
Benchmark — strings
We got two results for two different data schemas.
In one – protobufjs was faster, and in the second — JSON was faster.
Looking at the schemas, the immediate suspect was the number of strings.
Our schemas were composed almost entirely of strings. So we created a third test, populating a simple schema with many many many strings:
syntax = "proto3";
message TestStringArray {
repeated string stringArray = 1;
}
We ran the benchmark with this payload (10,000 strings, of length 10 each).
var payload = {
stringArray: Array(10000).fill('0123456789')
};
And the results proved our suspicion:
benchmarking encoding performance ...
protobuf.js (reflect) x 866 ops/sec ±0.68% (92 runs sampled)
JSON (string) x 2,411 ops/sec ±0.91% (94 runs sampled)
JSON (buffer) x 1,928 ops/sec ±0.85% (94 runs sampled)
JSON (string) was fastest
JSON (buffer) was 20.0% ops/sec slower (factor 1.2)
protobuf.js (reflect) was 64.0% ops/sec slower (factor 2.8)
When your data is composed of many strings, protobuf performance in JavaScript drops below those of JSON.
It might be related to JSON.stringify function being implemented in C++ inside V8 engine and highly optimized compared to the JS implementation of protobufjs.
Decoding
The benchmarks above are for encoding (serializing). The benchmarks results for decoding (deserializing) are similar.
Conclusion
If you have the time, our recommendation is to profile your common data, understand the expected performance of each option, and choose the format that works best for your needs.
It is essential to be aware that protobuf is not necessarily the fastest option.
If your data is mainly string, then JSON format might be a good choice.
Posted on April 21, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.