Parsing a JSON String recursively in a BFS style.
Daniel Marinho
Posted on February 3, 2024
This week I had to face a rather peculiar challenge, not because of the context it was embedded in, but because the insight for solving the problem ended up arising from memories of college days in data structure classes (which made me rethink some fallacies). 😁
Context 📝
Here's the deal: In a bank where message events were stored in JSONB format (Postgres), there were several messages saved as text; basically, a JSON String. The problem is that in this format, you can't take advantage of Postgres to index these messages or perform operations on them as if they were objects, for example:
--- events.data:
--- "{\"user\": { "\name\": \"daniel\", \"age\": \"30ish\" }}"
SELECT data FROM "events" WHERE data->'user'->>'name' = 'daniel';
The scenario above simply won't work because the information that should be JSONB is stored as text. To make matters worse, to make this work, I would have to iterate over this text by applying a "regex" to it, which is a terrible idea:
--- events.data:
--- "{\"user\": { "\name\": \"daniel\", \"age\": \"30ish\" }}"
SELECT data FROM "events" WHERE data::TEXT LIKE '%daniel%';
There are millions of complications and flaws in this model that I don't even know where to begin. To make matters worse, there are event records where there were several layers of string escaping, for example:
--- events.data:
--- "{\"user\": { \"name\": \"daniel\", \"age\": \"30ish\", \"address\": \"{ \\\"street\\\": \\\"some avenue\\\", \\\"number\\\": \\\"007\\\", \\\"state\\\": \\\"my precious state\\\" }\" }}"
As you descended another level within the object, more string escapes would exist. 😱
The Pseudo Solution 🐛
At first, the solution I had in mind was simple. I knew I could set up a small JavaScript routine to use JSON decoding natively and get an object in the end:
const jsonString = "{\"user\": { \"name\": \"daniel\", \"age\": \"30ish\", \"address\": \"{ \\\"street\\\": \\\"some avenue\\\", \\\"number\\\": \\\"007\\\", \\\"state\\\": \\\"my precious state\\\" }\" }}";
const jsonObject = JSON.parse(jsonString);
console.log(jsonObject);
/**
*{
* user: {
* "name": "daniel",
* "age": "30ish",
* "address": "{
* \"street\": \"some avenue\",
* \"number\": \"007\",
* \"state\": \"my precious state\"
* }"
* }
*}
**/
Using this approach, I could only solve the problem of the first layer. From the second layer onwards, the data would still not be in the object format I needed. I would have to do the process again, and again for each key, where the value persisted as a string.
Dividing the Problem 🧩
At first, I needed to break this problem into smaller things, starting with converting a string to an object, considering 2 possible scenarios:
The JSON could have undergone only one conversion to String, which would simplify our process to just reversing the method, applying JSON.parse()
.
The JSON could have undergone numerous conversion processes to String, making things a bit more complicated: I would need to convert the string N times until it became an object.
By solving these two problems, I could solve the larger problem (3rd problem) which was to rethink the process of converting a string to an object, iteratively over all the keys of the JSON.
Problem number 1 was already solved. I decided to use the method provided in JavaScript to transform a JSON String into an Object by applying JSON.parse()
. For this, I just needed to make sure that I was converting a string and not trying to apply JSON.parse()
on an object; which would throw an error at runtime.
export function isJsonParsable(json: unknown): boolean {
try {
if (typeof json === "string") {
JSON.parse(json);
return true;
} else {
return false;
}
} catch {
return false;
}
};
With the help of this utility, I could reuse it to check the value I was going to try to convert before applying the conversion:
export function cleanJsonString(
result: string | Record<string, any>,
): Record<string, any> {
if (isJsonParsable(result)) {
return cleanJsonString(JSON.parse(result as string));
} else {
return result as Record<string, any>;
}
}
These two functions would help me solve the main problem: converting the entire JSON String to a final object.
Applying BFS to a JSON ✨
Firstly; I understood that when we apply JSON.parse()
on a JSON String, it flattens it layer by layer, so to speak. If I apply JSON.stringify()
on an object 3 times, to undo it, I need to apply JSON.parse()
also 3 times to get the original value.
Based on this, I need to iterate over this structure, recursively, where for each new property that exists within the object, I redo the conversion process until all JSON properties are flattened.
The structure of an object is similar to the structure of a tree. You have several nodes that in turn can have another node or a primitive value, where it would be the end of the line, a leaf.
Applying BFS in this sense, I convert each key of the JSON to an object until it can no longer be converted. At this point, I start converting the children, always recursively, until at some point there will be no descendants left to be converted.
export function treeParsing(jsonString: string): Record<string, any> {
// We use a closure responsible for iterating
const bfs = (node: Record<string, any>): void => {
for (const key in node) {
// We will only flatten the nodes;
if (isJsonParsable(node[key])) {
node[key] = cleanJsonString(node[key]);
bfs(node[key]);
// We won't try to flatten leaves;
} else if (typeof node[key] === "object") {
bfs(node[key]);
}
}
};
// Validate that we will work with valid JSON String;
if (!isJsonParsable(jsonString)) return;
// Flatten the first level to work on the nodes;
const jsonData: Record<string, any> = cleanJsonString(jsonString);
bfs(jsonData);
return jsonData;
}
Applying BFS to each key of the object, I end up generating recursively the flattening of strings for each node. In the end, our example string where the deepest levels contain an excessive process of escapes, ends up returning to the original object:
const jsonString = "{\"user\": { \"name\": \"daniel\", \"age\": \"30ish\", \"address\": \"{ \\\"street\\\": \\\"some avenue\\\", \\\"number\\\": \\\"007\\\", \\\"state\\\": \\\"my precious state\\\" }\" }}";
const jsonObject = treeParsing(jsonString);
console.log(jsonObject);
/**
*{
* user: {
* name: "daniel",
* age: "30ish",
* address: {
* street: "some avenue",
* number: 007,
* state: "my precious state",
* }
* }
*}
**/
Now, the fun part! What's the time complexity of this beast? Well, for the validation part in the isJsonParsable
function, it mainly uses the JSON.parse()
to iterate over every character in the string to validate if the whole string sequence is a valid JSON string. Linear operation, that's an O(p). Now, the cleanJsonString
will depend on the number of times we'll need to convert the string back into an object. That differs from string to string, also linear, let's call O(s). Now for the recursive part, this will run every time for every new key-value the BFS generates; also linear time. We'll call it a classic O(n).
Time
O(n) * (O(p) + O(s))
O(n) * O(p + s)
// reducing constants
O(n)
In terms of time complexity, it's still linear, O(n), because even though the sizes of "p" and "s" might differ from "n", they are not independent of it. As "n" grows, both "p" and "s" are influenced by it. Therefore, the overall time complexity is still linear. So yes, we can say it's equivalent to O(n).
For space, we have a single variable - jsonData
- that stores all the information about the string and its value in a single space. Since its size depends on the size of the JSON itself, we can call it an O(n) in space.
Space
O(n)
There it is! A moment of epiphany that brought back good memories from college and made me rethink the phrase "I'll never use this in my life". 😂
Made myself a repo of this, for future reference. :)
Posted on February 3, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.