Franz Wong
Posted on November 23, 2020
Sometimes, we need to process big json file or stream but we don't need to store all contents in memory.
For example, when we count the number of items in a big array, we just need to load 1 item, increment the count, throw it away and repeat until the whole array is counted.
I found big json file from this git repository https://github.com/zemirco/sf-city-lots-json (~190MB).
The file looks this and I want to count the number of features.
{
"type": "FeatureCollection",
"features": [ /* lots of feature objects */ ]
}
This is how feature object looks like if you are interested.
{
"type": "Feature",
"properties": {
"MAPBLKLOT": "0001001",
"BLKLOT": "0001001",
"BLOCK_NUM": "0001",
"LOT_NUM": "001",
"FROM_ST": "0",
"TO_ST": "0",
"STREET": "UNKNOWN",
"ST_TYPE": null,
"ODD_EVEN": "E"
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-122.422003528252475,
37.808480096967251,
0.0
],
[
-122.422076013325281,
37.808835019815085,
0.0
],
[
-122.421102174348633,
37.808803534992904,
0.0
],
[
-122.421062569067274,
37.808601056818148,
0.0
],
[
-122.422003528252475,
37.808480096967251,
0.0
]
]
]
}
}
Let's say my application can only allocate 50MB and I try to load the whole file into memory.
Path filePath = Path.of("/src/sf-city-lots-json/citylots.json");
String content = Files.readString(filePath);
Obviously, we can't load it to memory.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
Gson provides JsonReader
which allows reading data stream.
public int getFeatureCount(Path filePath) throws Exception {
int count = 0;
try (JsonReader reader = new JsonReader(Files.newBufferedReader(filePath))) {
reader.beginObject();
while (reader.hasNext()) {
String name = reader.nextName();
if ("features".equals(name)) {
count = getFeatureCountFromArray(reader);
} else {
reader.skipValue();
}
}
reader.endObject();
}
return count;
}
private int getFeatureCountFromArray(JsonReader reader) throws Exception {
int count = 0;
reader.beginArray();
while (reader.hasNext()) {
count++;
reader.beginObject();
while (reader.hasNext()) {
reader.skipValue();
}
reader.endObject();
}
reader.endArray();
return count;
}
Greater power comes with greater responsibility. Unlike Gson.fromJson
, we need to call begin*
, end*
and skipValue
in the right timing (according to the structure of the json object) to let it process the data correctly, otherwise it will throw exception. So it should be used only when you have restriction on memory footprint or performance.
Posted on November 23, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.