100 Languages Speedrun: Episode 25: JQ
Tomasz Wegrzanowski
Posted on December 16, 2021
JSON actually did what XML promised to do, and became a near-universal data interchange format.
Every single programming language out there can handle JSON just fine, but sometimes you don't want to write a whole program - you'd much rather just do a shell one-liner like you can do with grep
or such. jq
does just that.
jq
is mainly used either on command line or as shell script, but for purpose of this episode we'll also check how it works for writing short standalone programs (which you then call from shell).
Pretty printing
jq
pretty-prints its output by default - and if output is a terminal also color-codes it. .
refers to whole input document.
So this one-letter jq
program is already doing something useful:
$ echo '{"name": "Alice", "surname": "Smith"}' | jq .
{
"name": "Alice",
"surname": "Smith"
}
A very common pattern for web development is to curl
something from some web API, then | jq .
to see it pretty-printed.
Hello, World!
Let's write some actual script.
You can put most valid JSON as jq
code, and that part will be just printed. .name
is equivalent to getting "name"
field from .
top level of the JSON.
$ echo '{"name": "Alice", "surname": "Smith"}' | jq '{"hello": .name}'
{
"hello": "Alice"
}
Do you even need JQ?
Before we do anything with JQ, let's answer a simple question - do we even need it. Two closest general purpose programming languages you could use for shell one-liner would be Ruby and Perl.
If we translate the example to Ruby, it would be:
$ echo '{"name": "Alice", "surname": "Smith"}' | ruby -rjson -e 'data=JSON.parse(STDIN.read); puts JSON.pretty_generate(hello: data["name"])'
{
"hello": "Alice"
}
Or in Perl:
$ echo '{"name": "Alice", "surname": "Smith"}' | perl -e 'use JSON; $_=decode_json(<>); print JSON->new->ascii->pretty->encode({"hello"=>$_->{"name"}})'
{
"hello" : "Alice"
}
These aren't terrible, but it's a good deal of boilerplate. They'd be somewhat more concise if we skipped pretty printing. So far jq is doing really well.
Do you even need JQ? Like really?
But wait, what if we pushed all that boilerplate code into a script. Let's make a super short wrapper for Ruby and call it rq
. It just loads JSON, evals ruby code you passed on command line, and pretty-prints the result:
#!/usr/bin/env ruby
require "json"
$_ = JSON.parse(STDIN.read)
$_ = eval(ARGV[0])
puts JSON.pretty_generate($_)
Of course if we made a real script, we would add some command line option for turning pretty printing on or off, coloring the output, and so on. But we're just exploring the issue here, not writing production code.
So how about now?
$ echo '{"name": "Alice", "surname": "Smith"}' | rq '{hello: $_["name"]}'
{
"hello": "Alice"
}
Damn, that's really competitive with jq, and that's a language that predates JSON by a decade! I don't think Ruby is as good as jq for JSON processing one-liners, but it shows just how much power knowing a top tier language like Ruby (or Python most of the time - but not so much in this case) gives you.
Cat Facts
So far I was implying that jq
gets JSON document as input, runs its code on it, then generates JSON document as output. That's not quite accurate. What it actually does is get any number of JSON documents, then runs the code on each one, then outputs all the results.
JSON documents are self-closing, so you can just concatenate any number of them. This kind of "JSON stream" is quite common, and usually such systems have one JSON document per line, but that's not enforced by jq - it will accept JSONs generated in any way.
So let's try some cat facts.
$ curl -s 'https://cat-fact.herokuapp.com/facts' | jq '.[]` | jq '.text'
"Cats make about 100 different sounds. Dogs make only about 10."
"Domestic cats spend about 70 percent of the day sleeping and 15 percent of the day grooming."
"I don't know anything about cats."
"The technical term for a cat’s hairball is a bezoar."
"Cats are the most popular pet in the United States: There are 88 million pet cats and 74 million dogs."
Cat Facts API returns an array with 5 objects in it (you can see it here).
jq .[]
takes each document, and runs .[]
on it. .[]
prints each top level value (of either array or object) as its own document.
jq .text
takes each document, and runs .text
on it. .text
prints just the value associated with the "text"
key.
The result is 5 strings, which are then printed out.
This kind of pipelining is extremely common, so we can do this instead:
$ curl -s 'https://cat-fact.herokuapp.com/facts' | jq '.[] | .text'
"Cats make about 100 different sounds. Dogs make only about 10."
"Domestic cats spend about 70 percent of the day sleeping and 15 percent of the day grooming."
"I don't know anything about cats."
"The technical term for a cat’s hairball is a bezoar."
"Cats are the most popular pet in the United States: There are 88 million pet cats and 74 million dogs."
Using jq as Calculator
A fun fact - a number is a valid JSON object!
So we can do this:
$ seq 1 10 | jq '(. / 10) + 2'
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3
seq
generates ten valid JSON documents (1
, 2
, 3
, ..., 10
, each on its own line, but it doesn't matter for jq
). JSON then runs (. / 10) + 2
on each of them, where .
is current document. Then it prints each document.
Unicode
Fun fact - jq correctly handles Unicode, even though JavaScript doesn't (and answers 2 to the last one).
$ echo '["Hello", "Żółw", "🍰"]' | jq '.[] | length'
5
4
1
Null Input
JQ can be used in a few ways other than JSON input. For example null input lets it be used as a pure generator. It can also take input as strings per line, as one big string, and a few other modes.
$ jq --null-input 'range(1;11) | {number: ., (if . % 2 == 0 then "even" else "odd" end): true }'
{
"number": 1,
"odd": true
}
{
"number": 2,
"even": true
}
{
"number": 3,
"odd": true
}
{
"number": 4,
"even": true
}
{
"number": 5,
"odd": true
}
{
"number": 6,
"even": true
}
{
"number": 7,
"odd": true
}
{
"number": 8,
"even": true
}
{
"number": 9,
"odd": true
}
{
"number": 10,
"even": true
}
What's going on:
-
jq --null-input ...
is basically the same asecho null | jq ...
- JSON document is just anull
-
range(1;11)
generates a sequence of numbers from 1 to 10, which we then pipe into the next stage - I still think default range convention should berange(start, end)
but half the programming languages dorange(start, end+1)
so jq is nothing special here - we pipe those ten JSON documents (
1
,2
, ...,10
) to second stage - second stage constructs a JSON object with two keys
-
number
is equal to input document - second key is evaluated as
(if . % 2 == 0 then "even" else "odd" end)
- you can use basically any expression as a key, but if it's something complicated you might need to parenthesize it - so it will be either{"even": true}
or{"odd": true}
FizzBuzz
JQ does not support standalone scripts with #!/usr/bin/env jq
, but it supports module files and functions.
So let's give it a go, creating fizzbuzz.jq
:
def fizzbuzz:
if . % 15 == 0
then "FizzBuzz"
elif . % 5 == 0
then "Buzz"
elif . % 3 == 0
then "Buzz"
else "\(.)"
end
;
That ;
is necessary, and "\(.)"
is string interpolation syntax.
Let's give it a go:
$ seq 1 20 | jq 'include "fizzbuzz"; fizzbuzz'
"1"
"2"
"Buzz"
"4"
"Buzz"
"Buzz"
"7"
"8"
"Buzz"
"Buzz"
"11"
"Buzz"
"13"
"14"
"FizzBuzz"
"16"
"17"
"Buzz"
"19"
"Buzz"
They have extra quotes compared with the standard FizzBuzz, but as this makes them valid JSON documents, I think this is more in the spirit of what we're doing. But if you don't like it you can change output mode to raw with -r
:
$ seq 1 20 | jq -r 'include "fizzbuzz"; fizzbuzz'
1
2
Buzz
4
Buzz
Buzz
7
8
Buzz
Buzz
11
Buzz
13
14
FizzBuzz
16
17
Buzz
19
Buzz
Fibonacci
It's not much harder to do Fibonacci with jq. First let's create fib.jq
:
def fib(n):
if n <= 2
then 1
else fib(n - 1) + fib(n - 2)
end;
Then we can run it, producing JSON array with valid answers:
$ jq --null-input 'include "fib"; [range(1;21) | fib(.)]'
[
1,
1,
2,
3,
5,
8,
13,
21,
34,
55,
89,
144,
233,
377,
610,
987,
1597,
2584,
4181,
6765
]
As we wrap the code in []
, it generates one array, instead of a lot of separate JSON documents.
Should you use JQ?
As far as domain-specific languages go, JQ is very intuitive, very concise, and really good at what it's doing. It doesn't share any of the failures of XSLT I recently reviewed. The code is actual properly designed language, not some JSON with special nodes for code.
And this atrocity can definitely happen to JSON, MongoDB query language serves similar role to JQ, but it represents code as JSON objects, with $
-nodes for code nodes, and as a consequence it's completely unreadable for anything except the simplest cases. If you don't believe me, try this converter, give it any aggregate SQL query, and weep.
Even when pushed outside its original purpose, like when we tried to do FizzBuzz or Fibonacci, JQ still handled itself extremely well.
I think its main competitor for shell one liners is Ruby. If you're Ruby programmer comfortable with using Ruby for shell one-liners already, JQ offers only modest improvement: JQ is more concise, but you know Ruby already, and Ruby one-liners can grow into proper scripts with ease, while JQ oneliners would need a full rewrite in another language once they get too complicated. You might still benefit from learning JQ, but it's up to you.
If you work with a lot of JSON data in an Unix-like environment (and that's most of us these days), and you don't know Ruby, then I highly recommend learning at least basics of JQ.
Either way, if you ever reach the point where you're writing big JQ module files, then maybe it's time to rethink it, and use a general purpose language instead. Fortunately JQ provides a lot of value by just handling the simple cases really well.
Code
All code examples for the series will be in this repository.
Posted on December 16, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.