Optimizing JSONPath lookups in PHP to extract data up to 7 times faster
Viktor Djupsjöbacka
Posted on November 14, 2022
Supermetrics is a SaaS power tool for transferring and integrating marketing data, so much of what our development teams do is about data retrieval and processing.
For example, if our customer wants to fetch the day-level engagement data for their five Facebook Pages from the last six months, we’ll likely make dozens, if not hundreds, of Facebook API calls. We then parse the API responses, traverse the result sets, and aggregate, sort, and package everything neatly into the format in which the client application, like Google Sheets, wants to receive the data.
During all this processing, we perform many data checks and lookups to make sure that the API responses are successful and that we catch any errors.
In this blog post, I’ll explain how we optimized the performance of JSONPath lookups for faster data extraction with an open-source PHP extension.
Enter JSONPath notation to make the handling of API responses more readable and maintainable
We use the convenient JSONPath notation in our PHP code for all these checks and lookups. A simplified version of a response handler that processes responses from API calls could look like this:
public function getResultRows(JsonResponse $response): ResultRowCollection
{
if ($response->hasArray('$.error')) {
throw new ErrorResponseException($response->getString('$.error.message'));
}
if (!$response->hasArray('$.resultRows')) {
throw new NoResultRowsException();
}
return new ResultRowCollection($response->getArray('$.resultRows'));
}
Similarly, when we process each result row, we might want to pick only specific properties from a nested structure:
public function getFieldsFromRow(ResultRow $resultRow, array $fieldMap): array
{
$result = [];
foreach ($fieldMap as $name => $jsonPathExpression) {
$result[$name] = $resultRow->getValue($jsonPathExpression);
}
return $result;
}
$fieldMap = [
'customerName' => '$.customer.name',
'productName' => '$.product.name',
'productPrice' => '$.product.price.total',
];
foreach ($resultRows as $i => $resultRow) {
$resultRows[$i] = getFieldsFromRow($resultRow, $fieldMap);
}
We could achieve all this perfectly well without using JSONPath — especially simplified examples like the above. But with deeper structures and more advanced lookups, the native PHP code often becomes much more verbose. The JSONPath notation gives nice syntactic sugar, making the code easier to read and maintain.
Forced use of a PHP library increases processing time
But nice things often come at a cost. PHP doesn’t include native support for JSONPath, so we’ve been using a PHP library to support JSONPath lookups. Unsurprisingly, using a PHP library to perform these lookups adds some performance overhead compared to writing the equivalent lookups in native PHP code.
In the grand scheme of things, the overhead is only a small fraction of the overall processing time in our requests, but I found it to be a decent target for some performance optimization. When thousands of requests are made and responses are big enough, processing time can add up quickly. In edge cases, it could even lead to being a bottleneck.
Alternative solution: JsonPath-PHP extension
I decided to look for other JSONPath solutions with two criteria in mind. Any alternative solution would have to:
- Be a drop-in replacement for our current solution. I didn’t want to switch to something so different from our current solution that it would require us to rewrite a lot of code.
- Be significantly faster than our current solution. I mean, performance optimization was the whole reason to consider other solutions.
One project, in particular, caught my attention. JsonPath-PHP wasn’t a PHP library like most other potential alternatives. Instead, it implemented JSONPath support as a PHP extension. JsonPath-PHP looked very promising in terms of performance, so I decided to give it a try.
And indeed, the extension filled my criteria. The improvement in processing speed was significant enough to convince me to go forward with this solution. There was only one problem — the project had not received any updates in over four years.
I reached out to Mike Kaminski, the creator of the project, and asked if he’d be fine with Supermetrics taking over maintenance of the JsonPath-PHP extension. Mike didn’t have an issue with it, so he transferred the ownership of the PHP extension to Supermetrics.
While I was pruning the extension code (👋 goodbye PHP 5.6 support!), Mike suddenly started to submit pull requests. First small bug fixes, then more major refactoring of the extension internals. Mike said he was inspired by the revival of the project and our real-world use of the extension, which spurred him to implement some improvements he had planned many years ago.
No one was happier about it than me! We continued to exchange ideas, planned the next steps, and split the work between us. There’s a lot of discussion on social media about toxicity in the open source community, and I’ve seen some of it myself. That makes me truly appreciate the way our project is panning out.
Fast-forward a couple of months, and the JsonPath-PHP extension now has an almost entirely rewritten engine, 95% code coverage, a robust CI pipeline, and installation through PECL. The first ship-it, ready for production use.
The extension beats libraries in processing speed
Circling back to where we started, the main goal was to improve performance for JSONPath lookups. So, how does the extension fare against the alternatives?
In a benchmark suite, where we compare the JsonPath-PHP extension to various JSONPath libraries, JsonPath-PHP is between two and seven times faster than the fastest PHP library. The differences are biggest with small datasets, but still significant with larger ones.
Compared to writing the equivalent (but more verbose) code in native PHP without the JSONPath notation, the JsonPath-PHP extension usually still loses in terms of performance. With small datasets (just a few rows), native PHP is about two times faster than the extension. With big datasets (tens of thousands of rows), the performance of the extension is even slightly better than the native PHP equivalent.
At Supermetrics, we like to keep our code clean, readable, and maintainable. Even if we lose some processing speed with the JsonPath-PHP extension compared to native PHP, it’s the lesser evil compared to dealing with tangled API call responses in code. And compared to the PHP libraries we’ve used before to do the same job, the extension performs significantly better.
If you want to try out the JsonPath-PHP extension in your application, head over to the GitHub repository for instructions on how to install and use the extension. We’d love to hear your feedback! Contributions are obviously welcome too.
And if you’d like to explore developer jobs at Supermetrics, visit our Engineering careers page.
Posted on November 14, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.