How to write an Elixir code formatter
Shohei Kajihara
Posted on December 14, 2019
Overview
Elixir has a great advantage to have its standard code formatter. Just running mix format
makes all source codes of your project formatted, so that you don't need to care about improper indents or spaces and can concentrate on important things when writing and reviewing codes.
However, sometimes you need a coding style guide which the formatter does not cover. Elixir's abundant ability of expression, which really I love, does not only let you write effectively but also leads to inconsistency of styles. This is because often each project has its coding style guide. My project also has one based on https://github.com/christopheradams/elixir_style_guide, where directives such as alias
are grouped and ordered as follow:
defmodule Example do
use Foo
import Foo
alias Foo
alias Foo.{A, B}
alias Foo.C.D
require Foo
end
This is a trivial rule but I feel annoyed when it is not obeyed:
defmodule Example1 do
import Foo
use Foo
alias Foo.{B, A}
end
defmodule Example2 do
use Foo
alias Foo.{A, B}
import Foo
end
Code reviews can remove such an inconsistency, however, result in the same problem as indents or spaces.
To solve this, I created a library to automatically format codes to follow the rule above: uiar. In this article, how to write a code formatter is briefly introduced by explaining the way the library is implemented as an example.
This is written as a part of Akatsuki’s 2019 Advent Calendar (Japanese).
How mix format
works
Before the library, let's examine how Elixir's standard code formatter works.
mix format
calls Mix.Tasks.Format.run/1
, of course, which internally invokes Code.format_string!/2
:
@spec format_string!(binary, keyword) :: iodata
def format_string!(string, opts \\ []) when is_binary(string) and is_list(opts) do
line_length = Keyword.get(opts, :line_length, 98)
algebra = Code.Formatter.to_algebra!(string, opts)
Inspect.Algebra.format(algebra, line_length)
end
Code.Formatter.to_algebra!/2
converts a source code to an AST and then a form called an algebra document, which is suitable for code formatting. Inspect.Algebra.format/2
formats an algebra into a string in a way to keep all lines to have length shorter than line_length
. See the document of Inspect.Algebra for details about algebra documents.
When converting a string to an AST, Code.Formatter.to_algebra!/2
passes its private function as an option to :elixir.string_to_tokens/4
. And furthermore, conversion to an algebra document is wholly done by its private functions. If we adopt a strategy to utilize these functionalities, we need to copy and paste a lot of codes from there.
Strategy
In order to achieve my purpose without copying and pasting codes, I decided to write my program to format codes from scratch. While this choise leads to lesser performance, we can avoid paying maintenance costs for the copied codes. My library has adopted a strategy to analyze a code by using its AST and then formats it by simple line-to-line text editing.
We could simply use regex or something if a code is naive, but in order to correctly analyze a large and complex codebase, it is better to utilize an AST. For example, it is easy to find out that each of line 2 and 3 is alias
and has a target Foo2
and Foo1
respectively from following code:
defmodule Example do
alias Foo2
alias Foo1
end
But, how about this?
defmodule Example do
alias Foo2.{
Bar2
Bar1.Baz,
}
alias Foo1
end
On the other hand, you may think that we can directly use an AST to format a code. However, the document of Macro.to_string/2 says:
This function discards all formatting of the original code.
In fact,
defmodule Example do
alias Foo2
alias Foo1
end
becomes as follows:
defmodule(Example) do
alias(Foo2)
alias(Foo1)
end
Thus, I implemented the formatting part by simple text editing. Details of this part are not described here because they depend on each formatting functionality. See the repository on GitHub if you are interested.
AST Examples
So, what we get when converted a code to an AST?
iex(1)> "example.ex" |> File.read!() |> Code.string_to_quoted!()
First, let's take a simple example.
defmodule Example do
alias Foo2
alias Foo1
end
It is converted to an AST:
{:defmodule, [line: 1],
[
{:__aliases__, [line: 1], [:Example]},
[
do: {:__block__, [],
[
{:alias, [line: 2], [{:__aliases__, [line: 2], [:Foo2]}]},
{:alias, [line: 3], [{:__aliases__, [line: 3], [:Foo1]}]}
]}
]
]}
The two lines below are important here:
{:alias, [line: 2], [{:__aliases__, [line: 2], [:Foo2]}]},
{:alias, [line: 3], [{:__aliases__, [line: 3], [:Foo1]}]}
Each directive is represented by a 3-length tuple. It is easy to find out that we just need to swap alias Foo2
and alias Foo1
at lines 2 and 3.
Then, let's move on to a bit complex one.
defmodule Example do
alias Foo2.{
Bar2
Bar1.Baz,
}
alias Foo1
def foo() do
:foo
end
end
It is also converted to an AST:
{:defmodule, [line: 1],
[
{:__aliases__, [line: 1], [:Example]},
[
do: {:__block__, [],
[
{:alias, [line: 2],
[
{{:., [line: 2], [{:__aliases__, [line: 2], [:Foo2]}, :{}]},
[line: 2],
[
{:__aliases__, [line: 3], [:Bar2]},
{:__aliases__, [line: 4], [:Bar1, :Baz]}
]}
]},
{:alias, [line: 6], [{:__aliases__, [line: 6], [:Foo1]}]},
{:def, [line: 8], [{:foo, [line: 8], []}, [do: :foo]]}
]}
]
]}
The lines grouped by {}
correspond to below:
{:alias, [line: 2],
[
{{:., [line: 2], [{:__aliases__, [line: 2], [:Foo2]}, :{}]},
[line: 2],
[
{:__aliases__, [line: 3], [:Bar2]},
{:__aliases__, [line: 4], [:Bar1, :Baz]}
]}
]},
The third element has a deeper nest rather than before and an operator :.
. By careful analysis, we can find out that the parent module is Foo2
and it has children Bar2
and Bar1.Baz
. We can also say that this is composed of multiple lines because each line
is 3 and 4, not 2.
By the way, the def
statement, which we are not interested in now, corresponds to
{:def, [line: 8], [{:foo, [line: 8], []}, [do: :foo]]}
and not has a keyword like :alias
or :import
, thus we can simply ignore it.
Result
The formatter has functionalities to follow coding styles below:
- Directives are grouped and ordered as
use
,import
,alias
andrequire
. - Each group of directive is separated by an empty line.
- Directives of the same group are ordered alphabetically.
- If a directive handles multiple modules with
{}
, they are alphabetically ordered.
When applied this to my projects consisted of about 1200 Elixir files, it took about 3 seconds on my MacBook Pro, and difference of approximately 400 lines was generated. This large difference is an embarrassing result, but I am going to prevent them by adding to my CircleCI's workflow the formatter with --check-formatted
option.
Summary
- You can write a code formatter which
mix format
does not cover to concentrate on important things when writing and reviewing codes. - Strategy to analyze AST and simply edit texts worked well.
Thank you for reading!
Posted on December 14, 2019
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.