Why YAML is better than JSON (read before screaming)

lenra_io

Alienor

Posted on March 17, 2023

Why YAML is better than JSON (read before screaming)

You have probably seen or used the YAML format in configuration files.
YAML (a recursive acronym for “YAML Ain’t Markup Language”) is a human-friendly data serialization language for all programming languages, like the JSON format.

The YAML files are mostly written using the Python-style indentation to indicate nesting such as the following example of a Dofigen file:

# An anchor
from: &image "docker.io/bitnami/node:18"
# A string field
workdir: /app
# An object list
builders:
  # An object
  - name: module-loader
    from: *image
    workdir: /tmp/module
    adds:
      - package.json
      - package-lock.json
    script:
      - npm i --production --cache /tmp/cache
    caches:
      - /tmp/cache
  - name: builder
    from: *image
    workdir: /tmp/app
    adds:
      - .
    script:
      - npm i --cache /tmp/cache
      - npm run build
    caches:
      - /tmp/cache
artifacts:
  - builder: module-loader
    source: /tmp/module/
    destination: "."
  - builder: builder
    source: /tmp/app/dist/
    destination: dist/
  - builder: builder
    source: /tmp/app/resources/
    destination: resources/
# A string list
cmd:
  - npm
  - start
# An integer list
ports:
  - 3000
ignores:
- "**"
- "!/*.json"
- "!/src/"
- "!/resources/"
Enter fullscreen mode Exit fullscreen mode

You may say:

OK, the YAML is great and very readable but how could it be better than JSON ?
They are just two different formats.

That's true, but I will show you how YAML is at least as good as JSON.

YAML is at least as good as JSON

The YAML format permits many ways to define the data.
We will see here the different ways:

strings

The basic string description is just writing the string:

myString: My super string
Enter fullscreen mode Exit fullscreen mode

In order to disambiguate other data types like numbers or booleans, we can define the string by surrounding the value with simple quotes(') or double quotes("):

myString: "My super string"
alsoString: '10'
stringNotBoolean: "true"
Enter fullscreen mode Exit fullscreen mode

For multiline strings we also can use two specific descriptions:

myString: >
  My super 
  multiline string

  Second line
otherString: |
  The other super multiline string
  Second line
Enter fullscreen mode Exit fullscreen mode

Look at this website to read more about it.

arrays/lists

The main array description in YAML is the next one:

myArray:
- My string value
- 42
Enter fullscreen mode Exit fullscreen mode

But YAML also permits the use of another syntax, by surrounding the array elements with brackets ([ and ]), very useful for short or empty arrays:

myShortArray: [first, second]
emptyArray: []
Enter fullscreen mode Exit fullscreen mode

objects/structures

The main object description in YAML is the following one:

name: My string
age: 24
subobject: 
  nestedField: The value
Enter fullscreen mode Exit fullscreen mode

That would result to the next JSON object:

{
  "name": "My string",
  "age": 24,
  "subobject": {
    "nestedField": "The value"
  }
}
Enter fullscreen mode Exit fullscreen mode

But the YAML format also permits the use of braces ({ and }) to surround the object fields and they can be separated by a comma+space instead of a new line.
It also permits the use of simple quotes(') or double quotes(") arround the field name.
Here is the same object with this syntax:

{
  'name': My string,
  "age": 24,
  subobject: { nestedField: The value }
}
Enter fullscreen mode Exit fullscreen mode

Merged YAML secondary format

Do you see where I'm going with this ?

Here is another way to describe the initial YAML example, but using the syntaxes seen previously:

{
  # An anchor
  "from": &image "docker.io/bitnami/node:18",
  # A string field
  "workdir": "/app",
  # An object list
  "builders": [
    # An object
    {
      "name": "module-loader",
      "from": *image,
      "workdir": "/tmp/module",
      "adds": [
        "package.json",
        "package-lock.json"
      ],
      "script": [ "npm i --production --cache /tmp/cache" ],
      "caches": [ "/tmp/cache" ]
    },
    {
      "name": "builder",
      "from": *image,
      "workdir": "/tmp/app",
      "adds": [ "." ],
      "script": [
        "npm i --cache /tmp/cache",
        "npm run build"
      ],
      "caches": [ "/tmp/cache" ]
    }
  ],
  "artifacts": [
    {
      "builder": "module-loader",
      "source": "/tmp/module/",
      "destination": "."
    },
    {
      "builder": "builder",
      "source": "/tmp/app/dist/",
      "destination": "dist/"
    },
    {
      "builder": "builder",
      "source": "/tmp/app/resources/",
      "destination": "resources/"
    }
  ],
  # A string list
  "cmd": [ "npm", "start" ],
  # An integer list
  "ports": [ 3000 ],
  "ignores": [
    "**",
    "!/*.json",
    "!/src/",
    "!/resources/"
  ]
}
Enter fullscreen mode Exit fullscreen mode

Does it look familiar ?
Yes, the YAML format is fully compatible with JSON data.

But there are still some elements that you can't find in JSON format.

YAML additional features

The YAML format also has many interesting additional features, let's take a look at some of those I use in my projects.

Comments

One of the features that I miss the most in JSON files is the comments.
How many times I tried to comment a dependency in a package.json file or a configuration...

In YAML, you can comment a line by just adding a hashtag (#) before the line content.
So simple !

# my comment
Enter fullscreen mode Exit fullscreen mode

Anchors

YAML format also lets you avoid repeating data in your configuration files thanks to the anchors.
With this feature you can define an anchor (&) by setting its name before a value (of any type).
You can then use an alias (*) as value of another field (later in the same YAML file).

The anchors let you change a value used many times at a single point like from Docker image in the initial example:

# An anchor
from: &image "docker.io/bitnami/node:18"
...
builders:
  # An object
  - name: module-loader
    from: *image
...
Enter fullscreen mode Exit fullscreen mode

This is even more useful for objects and arrays.

YAML anchors also let you extend and override an object for a new value by entering <<: before the alias.
Here is an example with a builder:

&base
from: "docker.io/bitnami/node:18"
workdir: /app
builders:
  - <<: *base
    name: module-loader # extension to add the name
    workdir: /tmp/module # override the workdir
    adds:
      - package.json
      - package-lock.json
    script:
      - npm i --production --cache /tmp/cache
    caches:
      - /tmp/cache
    builders: # override the build to null to avoid circular references
Enter fullscreen mode Exit fullscreen mode

This feature is also very useful but could lead to less readable files if not done wisely.

Conclusion

We have seen previously that the YAML format is fully compatible with the JSON one and that it has many additionnal features, but I wrote this article focusing on the human-friendly part of those languages.
This is not the only important aspect for a format.
The permissivity of the YAML format makes it more readable and easy to use (at least to me ^^), but it also can make it less efficient to process in a program...

To look for all the additionnal features see the full specification (at the current date).

Sources:

💖 💪 🙅 🚩
lenra_io
Alienor

Posted on March 17, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related