NaN Boxing in C# to realize a fast JSON parser
Kazuhiro Fujieda
Posted on December 9, 2020
This article shows how to use NaN Boxing to improve the performance of a JSON parser for C# named DynaJson.
About DynaJson
DynaJson is a fast JSON parser for C# compatible with DynamicJson. I feel DynamicJson is very useful but lack sufficient performance, so I developed a faster replacement.
DynaJson acts the same as DynamicJson, as shown below. It returns a dynamic type value representing a JSON object resulting from parsing a JSON string. We can access each member of the JSON object with the member access operator like a C# object. This manner is useful to process a large JSON with duck typing.
dynamic json = JsonObject.Parse(@"{
""foo"": ""json"",
""nest"": {""foobar"": true}
}");
string a1 = json.foo; // "json"
bool a2 = json.nest.foobar; // true
A benchmark result of DynaJson is shown below. The benchmark measured the time to process a large JSON file named citm_catalog.json, whose size is about 1.7MB. The benchmark compared DynaJson with Utf8Json, Jil, Newtonsoft.Json, DynamicJson. The measurement ran on .NET Core 3.1 on Ubuntu 18.04 on Azure D2Ds_v4.
The result shows DynaJson is quite faster than others. DynaJson is optimized for this use case, so the result is natural.
citm_catalog.json is one of the files used in Native JSON Benchmark. Many benchmarks for JSON parsers use this file.
An optimization by NaN Boxing
JSON parsers generate a large number of objects representing JSON values on parsing a large JSON file. So DynaJson represents a JSON value as a 16 bytes structure type. Copying a 16 bytes structure is much faster than larger ones because it needs only one load and store instruction for 128bit SIMD operations.
vmovdqu xmm0,xmmword ptr [rdi]
vmovdqu xmmword ptr [rsp+8],xmm0
The definition of the structure is shown below. It folds members into 16 bytes with an Explicit layout. The parser uses each of String
, Array
, and Dictionary
alternately based on the value of Type
, so the members can be laid out at the same offset. How about Number
and Type
? Type
of an enum type overlaps the upper 4 bytes of Number
. It can seemingly break Number.
[StructLayout(LayoutKind.Explicit)]
internal struct InternalObject
{
[FieldOffset(4)] public JsonType Type;
[FieldOffset(0)] public double Number;
[FieldOffset(8)] public string String;
[FieldOffset(8)] public JsonArray Array;
[FieldOffset(8)] public JsonDictionary Dictionary;
}
NaN Boxing can realize the overlap safely. NaN, standing for Not a Number, represents zero divided by zero and the square root of a negative number in floating-point arithmetics. See the Wikipedia entry for more details. One of the binary representations of NaN defined in IEEE 754 is 0xfff8000000000000. The lower 51 bits can have an arbitrary number, so NaN Boxing boxes data in the bits.
The definition of an enum type JsonType
is shown below. JsonType
has not only types of JSON values but also the values themselves for null, true, and false. The definition assigns a legitimate value as the upper 4 bytes of NaN to each member. The values never collide with the upper part of ordinary doubles. Any conjunctions with any lower 4 bytes become a legitimate double.
enum JsonType : uint
{
Null = 0xfff80001,
True = 0xfff80002,
False = 0xfff80003,
String = 0xfff80004,
Array = 0xfff80005,
Object = 0xfff80006
}
A usage of JsonType
is shown below. ToValue
is a method to extract a value JsonValue
represents. If the value of obj.Type
is not anything in the defined members, it must be an upper 4 bytes of an ordinary double, so ToValue
returns obj.Number
. This manner realizes the clear separation of Number
and Type
.
object ToValue(InternalObject obj)
{
switch (obj.Type)
{
case JsonType.Null:
return null;
case JsonType.True:
return true;
case JsonType.False:
return false;
case JsonType.String:
return obj.String;
case JsonType.Array:
case JsonType.Object:
return new JsonObject(obj);
default:
return obj.Number;
}
}
The following benchmark result shows the effect of NaN Boxing. The benchmark measured parsing time in each definition of InternalObject
as a class, 20 bytes (8 + 8 + 4) structure without NaN Boxing, and 16 bytes one with NaN Boxing. The 20 bytes structure made the parser 20% faster than the class, and NaN Boxing improved 6.8% more.
Other usages of NaN Boxing
JavaScript engines such as Mozilla's SpyderMonkey and WebKit's JavaScriptCore use NaN Boxing, but Google's V8 doesn't use it. LuaJIT and mruby also use NaN Boxing. These implementations overlap a pointer with a double and compact the size of all values into 8 bytes. C# doesn't allow a managed pointer to overlap a double, so it is impossible to compact InternalObject
into 8 bytes.
Posted on December 9, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.