Incremental compilation for Crystal - Part 2
Ary Borenszweig
Posted on January 4, 2023
In the last post we found out that we could build a file dependencies graph for a given project. We also saw that, depending on the code, a file could depend on one another even if it's not immediately clear that that's the case.
But... is that dependency issue really an issue?
Checking the tool against real-world projects
In this section I'll look at the dependencies graph of two medium/big projects:
- Mint
- The Crystal compiler
We can use this branch of the compiler to compile Mint and the compiler to get the "dependencies.dot" files.
Then we can use this program to analyze the above files. The program will read a "dependencies.dot" file, build the graph in memory, and then for each node in that graph count how many nodes are, recursively, affected by it. This answers this question: if we make a change to a file, what are all the files that we need to re-analyze?
Transitive dependencies
Before looking at that, though, note that I said "recursively". If a file A depends on B, and B depends on C, why if we make changes to C we need to re-analyze A?
Let's take a look at an example.
# foo.cr
require "bar"
def foo
Bar.bar
end
foo
# bar.cr
require "baz"
module Bar
def self.bar
Baz.baz
end
end
# baz.cr
module Baz
def self.baz
1
end
end
Here we have that foo.cr
depends on bar.cr
, and bar.cr
depends on baz.cr
.
What does foo
return? It returns an integer, because foo
calls Bar.bar
, which in turn calls Baz.baz
, which returns 1.
If we change Baz.baz
to return a string, foo
will start returning a string too. So, effectively, changing baz.cr
affected foo.cr
. And note that the program still compiles just fine.
Does this transitive dependency re-analysis happen in other languages too? I don't know. But let's imagine Crystal forces you to specify argument and return types for methods, like Java, C#, Go, Rust, and essentially every statically typed language out there.
Now we have these files:
# foo.cr
require "bar"
def foo : Int32
Bar.bar
end
foo
# bar.cr
require "baz"
module Bar
def self.bar : Int32
Baz.baz
end
end
# baz.cr
module Baz
def self.baz : Int32
1
end
end
Now if we change Baz.baz
to return a String
, Bar
will stop compiling. We can change Bar.bar
to return a String
and then change foo
to return a String
and then it will compile again. But doing that we changed all those files, so they naturally have to be re-analyzed.
But files don't always have to be re-analyzed like this. Let's say we change Baz.baz
to return a String
. We can make Bar
still compile and return an Int32
like this:
module Bar
def self.bar : Int32
Baz.baz.size # The size of the string
end
end
Here we ended up changing Bar
because of changes made to Baz
, but we didn't have to change foo
. Not only that: we don't have to re-analyze foo
because Bar.bar
still returns an int. We need to re-compile bar.cr
, but foo
can still keep calling the same method.
With all this, another partial conclusion is that not having mandatory type signatures in methods means even more that a change in one file can affect files that are far away in the dependency graph.
Checking affected files counts
If we run the above mentioned program on the "dependencies.dot" file for Mint, we'll see something like this:
src/macros.cr: 1
src/mint.cr: 1
src/utils/index_html.ecr: 1
src/utils/service_worker.ecr: 1
src/constants.cr: 2
src/app_init/scaffold.cr: 1193
src/artifact_cleaner.cr: 1193
src/assets.cr: 1193
src/ast.cr: 1193
...
1059 lines more
...
Total files: 1405
I forgot to mention that this only outputs files in "your project". That is, the most common scenario is that you'll change files in your project, not in shards or the standard library.
Also, the output is sorted by number of affected files for a given file.
In this project there are a total of 1405 files, including files from shards and the standard library. And for the vast majority of files, if we make a change to them it ends up affecting 1193 files. That means that we'll end up needing to re-analyze most of the files anyway.
Doing this with the compiler is similar:
src/compiler/crystal.cr: 1
src/compiler/crystal/tools/doc/html/_head.html: 1
src/compiler/crystal/tools/doc/html/_list_items.html: 1
src/compiler/crystal/tools/doc/html/_method_detail.html: 1
src/compiler/crystal/tools/doc/html/_method_summary.html: 1
src/compiler/crystal/tools/doc/html/_methods_inherited.html: 1
src/compiler/crystal/tools/doc/html/_other_types.html: 1
src/compiler/crystal/tools/doc/html/_sidebar.html: 1
src/compiler/crystal/tools/doc/html/js/doc.js: 1
src/compiler/crystal/tools/doc/html/main.html: 1
src/compiler/crystal/tools/doc/html/sitemap.xml: 1
src/compiler/crystal/tools/doc/html/type.html: 1
src/compiler/crystal/tools/init/template/example.cr.ecr: 1
src/compiler/crystal/tools/init/template/example_spec.cr.ecr: 1
src/compiler/crystal/tools/init/template/gitignore.ecr: 1
src/compiler/crystal/tools/init/template/license.ecr: 1
src/compiler/crystal/tools/init/template/readme.md.ecr: 1
src/compiler/crystal/tools/init/template/shard.yml.ecr: 1
src/compiler/crystal/tools/playground/views/_workbook.html.ecr: 1
src/compiler/crystal/tools/playground/views/layout.html.ecr: 1
src/crystal/compiler_rt/divmod128.cr: 1
src/crystal/compiler_rt/fixint.cr: 1
src/crystal/compiler_rt/float.cr: 1
src/crystal/compiler_rt/mul.cr: 1
src/crystal/once.cr: 1
src/crystal/system/process.cr: 1
src/http/server/handlers/static_file_handler.html: 1
src/log.cr: 1
src/log/setup.cr: 3
src/compiler/crystal/command.cr: 6
src/compiler/crystal/command/cursor.cr: 6
src/compiler/crystal/command/eval.cr: 6
src/compiler/crystal/command/repl.cr: 6
src/compiler/crystal/command/spec.cr: 6
src/compiler/crystal/command/docs.cr: 7
src/compiler/crystal/command/env.cr: 7
src/compiler/crystal/command/format.cr: 7
src/compiler/crystal/command/playground.cr: 7
src/compiler/crystal/tools/print_hierarchy.cr: 7
src/spec/cli.cr: 7
src/array.cr: 451
src/base64.cr: 451
...
444 lines more
...
Total files: 513
Again, with a total of 513 files, making changes to most files ends up affecting 451 other files, which are almost all of them. So trying to reuse previous compilation data won't be a lot of benefit.
The end?
Is this the end of the incremental compilation exploration? Is there nothing we can do about this? No! This is just the beginning. We have to look deeper into how things depend on each other. Maybe files aren't the incremental compilation unit we are looking for? Maybe we could introduce some small changes to the language to prevent transitive dependencies? Maybe the analysis can still be reused for methods that have a specified return type?
Like before, I didn't explore any of these options yet. Maybe I will!
Posted on January 4, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.