Kotlin compiler plugins and binaries on multiplatform

shikasd

Andrei Shikov

Posted on May 31, 2021

Kotlin compiler plugins and binaries on multiplatform

For the last half a year I have been working on multiplatform support for the Compose compiler plugin. Although it was already targeting Kotlin IR before, some constraints didn't apply to JVM, but did to other Kotlin targets. Many of those new constraints were quite unexpected for me, so naturally, I couldn't stop but share some of them to get it out of my head.

A small warning: most of these descriptions are derived from my own mental model, so there could be some technicalities that I may have missed or misunderstood. If you have spotted something like that, reach me out, and I will correct them!

Onto the first topic!

Serialized IR

Kotlin dependencies are usually distributed in two formats: .jar for JVM and .klib for other platforms:

  • .jar files contain JVM bytecode, which is compiled for each module separately. Kotlin compiler doesn't do any postprocessing on them outside of the initial compilation.
  • .klib files contain serialized IR, which is connected as a platform specific binary in a separate compilation step. Thus, when you publish .klib libraries, the Kotlin compiler does only part of the work on your side, processing source code into a more compact format. All the optimizations, inlining, and compilation to JS/assembly are done as a final step by the dependency consumer.

Additional steps for multiplatform modules are also defined as different Gradle tasks. After platform specific compileKotlin task (which produces .klib files), Kotlin executes another compilation step (e.g. compile*ExecutableKotlinJs for JS or link*ExecutableMacOS for macOS). This step, however, is absent for JVM targets.

You can argue that .jar files are usually also postprocessed after compilation by framework specific things (e.g. desktop packagers or Android transforms). However, Kotlin compiler doesn't participate in those things directly, so compiler plugins are not involved either.

Because of that, binary compatibility rules for non-JVM targets are defined by IR serialization, and in some ways, they are way stricter than those of JVM bytecode. Additionally, compiler plugins are operating on IR directly, and it becomes much easier to accidentally break that compatibility with our custom transforms.

All elements inside .klib libraries are connected to each other through IdSignature. Signatures identify almost every declaration in the IR tree, which means that not only functions and classes have signatures, but value/type parameters do as well (generic erasure happens on the later compilation stages). Elements inside each module are also connected with these signatures, so messing them up can result in failed assertions during the final compilation step.

// Example:
// calculating signature of the function
fun <T> test(value: T, param: String): T

// First, calculate string representation of function signature
JvmManglerIr.run { function.signatureString }
// Result: test(0:0;kotlin.String){0§<kotlin.Any?>}
//              ^^^                ^
//              note type params indexed here

// From string representation, create hash
JvmManglerIr.run { function.signatureString.hashMangle }
// Result: -3871701131444211271

// Finally, use signature hash to create IdSignature
IdSignature.PublicSignature(
    packageFqName = "my.test",
    declarationFqName = "test",
    id = -3871701131444211271,
    mask = 0
)
// Result: public my.test/test|-3871701131444211271[0]

// Similarly, signature of the type parameter
IdSignature.FileLocalDeclaration(
    container = functionSignature,
    id = 1
)
// Result: private my.test/test|-3871701131444211271[0]:1
Enter fullscreen mode Exit fullscreen mode

Signatures are used to recreate IrSymbols and reconstruct IR references from them. You can encounter two types of symbol usages: declaring a symbol (functions or classes always declare symbols of themselves with the corresponding signatures) or referencing a symbol (e.g. function call always references the symbol of the function it is calling). In the correct IR tree, there should only be a single declaration of a symbol. If a symbol is not declared anywhere, the compiler will crash with the ("unbound symbols") error.

Symbol declaration - reference - idsignature - deserialization

Enough explanations, what did go wrong?

Symbol related errors can be caused by even small things. Generic types are frequent offenders there:

  • Imagine a compiler plugin that adds a call to fun <T> test(value: T): T.
  • The simplest way to produce such call is to use irCall(symbol) helper. But which return type is created by this helper?
  • The type is taken from the function signature, so it is typed as a parameter <T>, which isn't part of the public API. (type parameters are always serialized as private elements)
  • When deserializing such call, the function is found, but the type parameter definition is not.
  • Boom!

The fun part of this process is that it compiles just fine on JVM, as generics are non-existent on the bytecode level and that type-related inconsistency does not affect IR-to-bytecode transition. Here's a similar bug I have fixed some time ago for Compose.


Similar concerns can be applied when cloning/substituting IR elements that define type parameters themselves: a good example is cloning a function. If you want to add new type/value parameters, you usually make a copy of a function (to avoid clashes). However, it is important to also update references of types and parameters in the body! Otherwise, it could create a case when the type is technically correct but uses type parameters from the wrong copy that was removed from the IR tree.

Here's an example of a signature with a type parameter that is absent from the tree:

test(-1:0;kotlin.String){0§<kotlin.Any?>}
     ^^^^            
Enter fullscreen mode Exit fullscreen mode

First index here indicates how deep the parent is in the file. 0 means that function is the top element, 1 is that is inside a top-level class, and so on. -1 means that the type parameter wasn't found in the parent scope, which points out a potential error.

Mistakes like these result in errors either during IR validation (easy to catch) or IR deserialization (who writes cross-module compilation tests anyway?). Change that enabled compilation for K/Native has a fix for a problem just like that.

Changing public API of IR

Compose compiler plugin alters not only the body of the functions but their signatures as well. Each @Composable function gets additional hidden parameters that are generated at compile time.

These synthetic parameters are hidden from user eyes (at least in Kotlin), as they are added during the code-generating phase (in IR). They also don't exist in metadata, which the compiler uses to understand what functions or classes are present in dependencies. This metadata is generated before the IR phase (created out of the descriptor tree), so @Composable functions work without those additional parameters just fine.

When we go down to the IR level, the way the compiler provides IR for dependencies starts to differ. On JVM and K/Native, all IR for dependencies is generated from the metadata. This also allows generating some kind of IR elements for things outside of Kotlin, like Java classes and functions extracted from JVM bytecode. Such elements are easy to find by looking at their origin: IR_EXTERNAL_DECLARATION_STUB or IR_EXTERNAL_JAVA_DECLARATION_STUB. As the name suggests, these elements are "stubs": only a function definition without a body.

As the stubs are generated from metadata, IR functions provided from dependencies don't have Compose-specific synthetic parameters in them, but their bytecode does! That's why Compose compiler plugin alters stubs for such functions to make sure these parameters are also present in the calls to them.

If you don't apply Compose plugin to one of your modules, it will compile just fine, but bytecode calls won't match. It works this way because the compiler doesn't check the bytecode of other Kotlin modules but only packaged metadata.

... in Kotlin/JS

This process is different for Kotlin/JS though. Here, the IR of dependencies is deserialized directly from IR packaged in the artifact, and all the metadata calls are matched through IdSignature.

Just adding parameters for dependencies on the IR stage won't cut it anymore! The compilation fails during the step of creating IR in the first place while trying to deserialize a non-existent function with the original signature.

To mitigate this, Compose copies functions in Kotlin/JS instead of replacing them:

  • The first one is required to make sure that each function in Kotlin metadata has the IR counterpart with the matching signature. It is kept unchanged during compilation and should be never executed at runtime. Functions like this are referred to as "decoys", and are usually tree-shaken out by the webpack optimizer.
  • The second one, the copy, is modified by Compose as needed. All references pointed to "decoy" are modified to reference this new function. To differentiate them in runtime (and for debugging purposes), it also has $composable suffix added to its name.

Example:

// -------------- original --------------- 
@Composable
fun Counter() {
  ...
}

// ------------- transformed ------------- 
@Decoy(...)
fun Counter() { // signature is kept the same
  illegalDecoyCallException("Counter")
}

@DecoyImplementation(...)
fun Counter$composable( // signature is changed
    $composer: Composer,
    $changed: Int
) {
  ...transformed code...
}
Enter fullscreen mode Exit fullscreen mode

Going back to linking dependencies, by default, calls into other modules are also referencing "decoy" functions (because only decoys match provided metadata signatures). Instead of changing them as it is done in JVM and Native, a previously modified copy is deserialized from the .klib to replace the "decoy".

On a side note, obtaining elements that exist only in IR from serialized artifacts can also be quite a challenge. If you ever encounter elements that cannot be referenced by usual IrPluginContext.reference* methods, it might mean that you have to force deserialization of those. Specifically for Compose case above, Kotlin team added IrDeserializer.resolveBySignatureInModule API, but it is still unstable (like everything in compiler API is).

Debugging tips

Even after following all the guidelines, it is still very easy to encounter one of those issues during compiler plugin development. How does one go about debugging them?

  • Having a local copy of compiler sources (Kotlin Github repo) is invaluable. All the compiler distributions don't package sources at the moment, and being able to step through from your plugin is very important.
  • Attaching debugger to the Kotlin compiler from Gradle is the second step to successfully debug your compilation when it breaks:
    • Add -Dkotlin.daemon.jvm.options="-Xdebug,-Xrunjdwp:transport=dt_socket\,address=5005\,server=y\,suspend=n" to the Gradle command to debug Kotlin JVM/JS compilation, or -Dorg.gradle.debug=true for Kotlin/Native. (These parameters also work from local gradle.properties without -D prefix.)
    • Connect Remote debugger from IDEA.
  • You can get a string representation of IR elements using IrElement.dump() and IrElement.dumpKotlinLike(). The former contains all the information about symbols (e.g. where type parameters are coming from), the latter prints an approximation of the Kotlin code which is representing current element.
  • Tests! For example, you can check dumped IR from the item above. Compose is testing a lot of transforms with their version of dumpKotlinLike(), and it is /very/ fast and effective. The downside here is a complicated setup (potential contribution to kotlin-compile-testing?). An example of how it is done in Compose for JVM and JS is here.

Thank you for reading the loosely connected braindump above! I sometimes post similar things and related announcements on my Twitter account, so make sure to follow there! (this whole thing was made out of a thread).

Bonus: additional sources

💖 💪 🙅 🚩
shikasd
Andrei Shikov

Posted on May 31, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related