How to write an Elixir code formatter

scapybara

Shohei Kajihara

Posted on December 14, 2019

How to write an Elixir code formatter

Overview

Elixir has a great advantage to have its standard code formatter. Just running mix format makes all source codes of your project formatted, so that you don't need to care about improper indents or spaces and can concentrate on important things when writing and reviewing codes.

However, sometimes you need a coding style guide which the formatter does not cover. Elixir's abundant ability of expression, which really I love, does not only let you write effectively but also leads to inconsistency of styles. This is because often each project has its coding style guide. My project also has one based on https://github.com/christopheradams/elixir_style_guide, where directives such as alias are grouped and ordered as follow:

defmodule Example do
  use Foo

  import Foo

  alias Foo
  alias Foo.{A, B}
  alias Foo.C.D

  require Foo
end

This is a trivial rule but I feel annoyed when it is not obeyed:

defmodule Example1 do
  import Foo
  use Foo
  alias Foo.{B, A}
end

defmodule Example2 do
  use Foo

  alias Foo.{A, B}
  import Foo
end

Code reviews can remove such an inconsistency, however, result in the same problem as indents or spaces.

To solve this, I created a library to automatically format codes to follow the rule above: uiar. In this article, how to write a code formatter is briefly introduced by explaining the way the library is implemented as an example.

This is written as a part of Akatsuki’s 2019 Advent Calendar (Japanese).

How mix format works

Before the library, let's examine how Elixir's standard code formatter works.

mix format calls Mix.Tasks.Format.run/1, of course, which internally invokes Code.format_string!/2:

@spec format_string!(binary, keyword) :: iodata
def format_string!(string, opts \\ []) when is_binary(string) and is_list(opts) do
  line_length = Keyword.get(opts, :line_length, 98)
  algebra = Code.Formatter.to_algebra!(string, opts)
  Inspect.Algebra.format(algebra, line_length)
end

Code.Formatter.to_algebra!/2 converts a source code to an AST and then a form called an algebra document, which is suitable for code formatting. Inspect.Algebra.format/2 formats an algebra into a string in a way to keep all lines to have length shorter than line_length. See the document of Inspect.Algebra for details about algebra documents.

When converting a string to an AST, Code.Formatter.to_algebra!/2 passes its private function as an option to :elixir.string_to_tokens/4. And furthermore, conversion to an algebra document is wholly done by its private functions. If we adopt a strategy to utilize these functionalities, we need to copy and paste a lot of codes from there.

Strategy

In order to achieve my purpose without copying and pasting codes, I decided to write my program to format codes from scratch. While this choise leads to lesser performance, we can avoid paying maintenance costs for the copied codes. My library has adopted a strategy to analyze a code by using its AST and then formats it by simple line-to-line text editing.

We could simply use regex or something if a code is naive, but in order to correctly analyze a large and complex codebase, it is better to utilize an AST. For example, it is easy to find out that each of line 2 and 3 is alias and has a target Foo2 and Foo1 respectively from following code:

defmodule Example do
  alias Foo2
  alias Foo1
end

But, how about this?

defmodule Example do
  alias Foo2.{
    Bar2
    Bar1.Baz,
  }
  alias Foo1
end

On the other hand, you may think that we can directly use an AST to format a code. However, the document of Macro.to_string/2 says:

This function discards all formatting of the original code.

In fact,

defmodule Example do
  alias Foo2
  alias Foo1
end

becomes as follows:

defmodule(Example) do
  alias(Foo2)
  alias(Foo1)
end

Thus, I implemented the formatting part by simple text editing. Details of this part are not described here because they depend on each formatting functionality. See the repository on GitHub if you are interested.

AST Examples

So, what we get when converted a code to an AST?

iex(1)> "example.ex" |> File.read!() |> Code.string_to_quoted!()

First, let's take a simple example.

defmodule Example do
  alias Foo2
  alias Foo1
end

It is converted to an AST:

{:defmodule, [line: 1],
 [
   {:__aliases__, [line: 1], [:Example]},
   [
     do: {:__block__, [],
      [
        {:alias, [line: 2], [{:__aliases__, [line: 2], [:Foo2]}]},
        {:alias, [line: 3], [{:__aliases__, [line: 3], [:Foo1]}]}
      ]}
   ]
 ]}

The two lines below are important here:

        {:alias, [line: 2], [{:__aliases__, [line: 2], [:Foo2]}]},
        {:alias, [line: 3], [{:__aliases__, [line: 3], [:Foo1]}]}

Each directive is represented by a 3-length tuple. It is easy to find out that we just need to swap alias Foo2 and alias Foo1 at lines 2 and 3.

Then, let's move on to a bit complex one.

defmodule Example do
  alias Foo2.{
    Bar2
    Bar1.Baz,
  }
  alias Foo1

  def foo() do
    :foo
  end
end

It is also converted to an AST:

{:defmodule, [line: 1],
 [
   {:__aliases__, [line: 1], [:Example]},
   [
     do: {:__block__, [],
      [
        {:alias, [line: 2],
         [
           {{:., [line: 2], [{:__aliases__, [line: 2], [:Foo2]}, :{}]},
            [line: 2],
            [
              {:__aliases__, [line: 3], [:Bar2]},
              {:__aliases__, [line: 4], [:Bar1, :Baz]}
            ]}
         ]},
        {:alias, [line: 6], [{:__aliases__, [line: 6], [:Foo1]}]},
        {:def, [line: 8], [{:foo, [line: 8], []}, [do: :foo]]}
      ]}
   ]
 ]}

The lines grouped by {} correspond to below:

        {:alias, [line: 2],
         [
           {{:., [line: 2], [{:__aliases__, [line: 2], [:Foo2]}, :{}]},
            [line: 2],
            [
              {:__aliases__, [line: 3], [:Bar2]},
              {:__aliases__, [line: 4], [:Bar1, :Baz]}
            ]}
         ]},

The third element has a deeper nest rather than before and an operator :.. By careful analysis, we can find out that the parent module is Foo2 and it has children Bar2 and Bar1.Baz. We can also say that this is composed of multiple lines because each line is 3 and 4, not 2.

By the way, the def statement, which we are not interested in now, corresponds to

        {:def, [line: 8], [{:foo, [line: 8], []}, [do: :foo]]}

and not has a keyword like :alias or :import, thus we can simply ignore it.

Result

The formatter has functionalities to follow coding styles below:

  • Directives are grouped and ordered as use, import, alias and require.
  • Each group of directive is separated by an empty line.
  • Directives of the same group are ordered alphabetically.
  • If a directive handles multiple modules with {}, they are alphabetically ordered.

When applied this to my projects consisted of about 1200 Elixir files, it took about 3 seconds on my MacBook Pro, and difference of approximately 400 lines was generated. This large difference is an embarrassing result, but I am going to prevent them by adding to my CircleCI's workflow the formatter with --check-formatted option.

Summary

  • You can write a code formatter which mix format does not cover to concentrate on important things when writing and reviewing codes.
  • Strategy to analyze AST and simply edit texts worked well.

Thank you for reading!

💖 💪 🙅 🚩
scapybara
Shohei Kajihara

Posted on December 14, 2019

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related