Incremental compilation for Crystal - Part 2

asterite

Ary Borenszweig

Posted on January 4, 2023

Incremental compilation for Crystal - Part 2

In the last post we found out that we could build a file dependencies graph for a given project. We also saw that, depending on the code, a file could depend on one another even if it's not immediately clear that that's the case.

But... is that dependency issue really an issue?

Checking the tool against real-world projects

In this section I'll look at the dependencies graph of two medium/big projects:

  • Mint
  • The Crystal compiler

We can use this branch of the compiler to compile Mint and the compiler to get the "dependencies.dot" files.

Then we can use this program to analyze the above files. The program will read a "dependencies.dot" file, build the graph in memory, and then for each node in that graph count how many nodes are, recursively, affected by it. This answers this question: if we make a change to a file, what are all the files that we need to re-analyze?

Transitive dependencies

Before looking at that, though, note that I said "recursively". If a file A depends on B, and B depends on C, why if we make changes to C we need to re-analyze A?

Let's take a look at an example.

# foo.cr
require "bar"

def foo
  Bar.bar
end

foo

# bar.cr
require "baz"

module Bar
  def self.bar
    Baz.baz
  end
end

# baz.cr
module Baz
  def self.baz
    1
  end
end
Enter fullscreen mode Exit fullscreen mode

Here we have that foo.cr depends on bar.cr, and bar.cr depends on baz.cr.

What does foo return? It returns an integer, because foo calls Bar.bar, which in turn calls Baz.baz, which returns 1.

If we change Baz.baz to return a string, foo will start returning a string too. So, effectively, changing baz.cr affected foo.cr. And note that the program still compiles just fine.

Does this transitive dependency re-analysis happen in other languages too? I don't know. But let's imagine Crystal forces you to specify argument and return types for methods, like Java, C#, Go, Rust, and essentially every statically typed language out there.

Now we have these files:

# foo.cr
require "bar"

def foo : Int32
  Bar.bar
end

foo

# bar.cr
require "baz"

module Bar
  def self.bar : Int32
    Baz.baz
  end
end

# baz.cr
module Baz
  def self.baz : Int32
    1
  end
end
Enter fullscreen mode Exit fullscreen mode

Now if we change Baz.baz to return a String, Bar will stop compiling. We can change Bar.bar to return a String and then change foo to return a String and then it will compile again. But doing that we changed all those files, so they naturally have to be re-analyzed.

But files don't always have to be re-analyzed like this. Let's say we change Baz.baz to return a String. We can make Bar still compile and return an Int32 like this:

module Bar
  def self.bar : Int32
    Baz.baz.size # The size of the string
  end
end
Enter fullscreen mode Exit fullscreen mode

Here we ended up changing Bar because of changes made to Baz, but we didn't have to change foo. Not only that: we don't have to re-analyze foo because Bar.bar still returns an int. We need to re-compile bar.cr, but foo can still keep calling the same method.

With all this, another partial conclusion is that not having mandatory type signatures in methods means even more that a change in one file can affect files that are far away in the dependency graph.

Checking affected files counts

If we run the above mentioned program on the "dependencies.dot" file for Mint, we'll see something like this:

src/macros.cr: 1
src/mint.cr: 1
src/utils/index_html.ecr: 1
src/utils/service_worker.ecr: 1
src/constants.cr: 2
src/app_init/scaffold.cr: 1193
src/artifact_cleaner.cr: 1193
src/assets.cr: 1193
src/ast.cr: 1193
...
1059 lines more
...

Total files: 1405
Enter fullscreen mode Exit fullscreen mode

I forgot to mention that this only outputs files in "your project". That is, the most common scenario is that you'll change files in your project, not in shards or the standard library.

Also, the output is sorted by number of affected files for a given file.

In this project there are a total of 1405 files, including files from shards and the standard library. And for the vast majority of files, if we make a change to them it ends up affecting 1193 files. That means that we'll end up needing to re-analyze most of the files anyway.

Doing this with the compiler is similar:

src/compiler/crystal.cr: 1
src/compiler/crystal/tools/doc/html/_head.html: 1
src/compiler/crystal/tools/doc/html/_list_items.html: 1
src/compiler/crystal/tools/doc/html/_method_detail.html: 1
src/compiler/crystal/tools/doc/html/_method_summary.html: 1
src/compiler/crystal/tools/doc/html/_methods_inherited.html: 1
src/compiler/crystal/tools/doc/html/_other_types.html: 1
src/compiler/crystal/tools/doc/html/_sidebar.html: 1
src/compiler/crystal/tools/doc/html/js/doc.js: 1
src/compiler/crystal/tools/doc/html/main.html: 1
src/compiler/crystal/tools/doc/html/sitemap.xml: 1
src/compiler/crystal/tools/doc/html/type.html: 1
src/compiler/crystal/tools/init/template/example.cr.ecr: 1
src/compiler/crystal/tools/init/template/example_spec.cr.ecr: 1
src/compiler/crystal/tools/init/template/gitignore.ecr: 1
src/compiler/crystal/tools/init/template/license.ecr: 1
src/compiler/crystal/tools/init/template/readme.md.ecr: 1
src/compiler/crystal/tools/init/template/shard.yml.ecr: 1
src/compiler/crystal/tools/playground/views/_workbook.html.ecr: 1
src/compiler/crystal/tools/playground/views/layout.html.ecr: 1
src/crystal/compiler_rt/divmod128.cr: 1
src/crystal/compiler_rt/fixint.cr: 1
src/crystal/compiler_rt/float.cr: 1
src/crystal/compiler_rt/mul.cr: 1
src/crystal/once.cr: 1
src/crystal/system/process.cr: 1
src/http/server/handlers/static_file_handler.html: 1
src/log.cr: 1
src/log/setup.cr: 3
src/compiler/crystal/command.cr: 6
src/compiler/crystal/command/cursor.cr: 6
src/compiler/crystal/command/eval.cr: 6
src/compiler/crystal/command/repl.cr: 6
src/compiler/crystal/command/spec.cr: 6
src/compiler/crystal/command/docs.cr: 7
src/compiler/crystal/command/env.cr: 7
src/compiler/crystal/command/format.cr: 7
src/compiler/crystal/command/playground.cr: 7
src/compiler/crystal/tools/print_hierarchy.cr: 7
src/spec/cli.cr: 7
src/array.cr: 451
src/base64.cr: 451
...
444 lines more
...

Total files: 513
Enter fullscreen mode Exit fullscreen mode

Again, with a total of 513 files, making changes to most files ends up affecting 451 other files, which are almost all of them. So trying to reuse previous compilation data won't be a lot of benefit.

The end?

Is this the end of the incremental compilation exploration? Is there nothing we can do about this? No! This is just the beginning. We have to look deeper into how things depend on each other. Maybe files aren't the incremental compilation unit we are looking for? Maybe we could introduce some small changes to the language to prevent transitive dependencies? Maybe the analysis can still be reused for methods that have a specified return type?

Like before, I didn't explore any of these options yet. Maybe I will!

💖 💪 🙅 🚩
asterite
Ary Borenszweig

Posted on January 4, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related