Kazuhiro Fujieda
Posted on September 26, 2020
I developed a JSON Parser for C# named DynaJson. It is very strict to the standard of RFC 8259. The parser accepts all conformant and rejects all non-conformant JSONs except for two exceptions.
One exception is trailing-commas. Another is leading 0 in numbers, for example, 02
and -02
. The former is for practicality. The latter is for compatibility with DynamicJson.
JSON's grammar is simple, but carelessly implemented parsers don't accept all conformant and not reject non-conformant JSONs. An excellent article of Parsing JSON is a Minefield shows where mines are in implementing JSON parsers.
For example, some parsers reject just scalars such as null
. Some parsers reject arrays of empty arrays such as [[][]]
, or numbers whose exponent digit is 0 like 1e0
. Many parsers reject objects whose keys are not unique such as {"a":1,"a":2}
.
Most parsers accept various non-conformant JSONs. They allow numbers such as .2
, -.1
, and +1
, strings containing not escaped control characters or trailing garbage such as [0]]
.
DynaJson had struck many mines at first because of prejudice and carelessness. I finally made it strict, as mentioned above. It rejects most non-conformant JSONs to detect breakage.
JSON Parsing Test Suite developed by the article's author can show how parsers handle these corner cases. I created a fork of it to check six parsers for .NET and show the results below.
The left part shows the files of tested JSONs. The center has the results. The right shows the JSONs. Parsers must accept files starting with y_
, should reject with n_
, can take either way about i_
. The upper-left corner shows how the color of each result means.
System.Text.Json introduced since .NET Core 3.0 is the strictest. It accepts all conformant and rejects all non-conformant, but can accept trailing-commas with an option. Only Utf8Json among them can't take trailing-commas.
Each parser has a pattern of accepted non-conformant JSONs. Json.NET takes objects whose keys are not strings. Utf8Json allows trailing garbages. DynamicJson takes objects not enclosed with }
. Jil allows incorrect exponent parts in numbers.
The results colored red at the bottom are of parsing JSONs nested 100,000 times and not closed. Parsers should reject them. But DynamicJson resulted in a timeout after processing 5 sec. Jil and Utf8Json caused stack overflow and crashed.
Json.NET allows unlimited nesting until the upper bound of memory by default. It can reject them with syntax errors. System.Text.Json and DynaJson have configurable maximums of the nesting level. The former is 64, and the latter is 512. So both reject the JSONs because of too deep nesting.
Posted on September 26, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.