On Subtyping vs. Row Polymorphism, information loss, and actionable error messages

maxheiber

Max Heiber

Posted on December 11, 2020

On Subtyping vs. Row Polymorphism, information loss, and actionable error messages

Brian McKenna's "Row Polymorphism Isn't Subtyping" is about the distinction between row polymorphism and subtyping. Several re-readings led me to convince myself I understood, then a real-world bug to lead me to a stronger conclusion on why the distinction matters.

To help cement my understanding or misunderstanding, I'll

  • say more about how row polymorphism isn't subtyping.
  • share an example of how the information loss from subtyping can lead to hard-to-diagnose bugs

Note: Good background easy reads: TypeScript's subtyping and Elm's extensible records (row polymorphism lite).

How exactly Row Polymorphism is not Subtyping

If I interpret McKenna correctly, he's saying that all and only type systems with subtyping have the subsumption rule, which says that if T1 is a subtype of T2 and e is a T1 then e is also a T2:

Subsumption Rule


e has type T1 and T1 subtype of T2
---------therefore
e has type T2

or, using more symbols:

e: T1 and T1 <: T2
--------
e: T2

Enter fullscreen mode Exit fullscreen mode

Note: Technically, I think it's a rule schema and not a rule, but I'll be sloppy

It took me some time to see how row polymorphism isn't just the <: in this rule.

I'll follow Elm and use {rowvar | a: Int, b: String} to mean "a record with at least fields a: Int and b: String. rowvar is the row variable: it stands for the other fields in the record in addition to a and b. More generally: {rowvar | k1: F1, ... kn: Fn} means "Record with at least fields k1 of type F1, k2 of type F2, etc."

It seems we can define <: (subtype of) in terms of row variables.

when T1 is {rho1 | k1: F1, ... kn: Fn, ln: Gn ...}
and  T2 is {rho2 | k1: F1, ... kn: Fn}

then T1 <: T2  (T1 is a subtype of T2)
Enter fullscreen mode Exit fullscreen mode

But then the subsumption rule still doesn't follow:

Subsumption Rule

e: T1 and T1 <: T2
----------------
e: T2
Enter fullscreen mode Exit fullscreen mode

The extra work the subsumption rule is doing over rules for row polymorphism (which I did not write down) is that the subsumption rule allows an expression to have more than one type: e1 is both a T1 and a T2. What makes this interesting is that information is lost: when we see that something is a T2, we don't know whether it is also a T1.

But with row polymorphism, the row variables maintain the information that would be lost with subtyping. What I did above in defining <: in terms of row polymorphism sneakily ditched the row variable, which had the information about the other fields. Here's what happens if we don't sneak, and try to write something similar to the subsumption rule directly using row variables:

e1: T1
T1 is {rho1 | e1: F1, ... en: Fn, ep: Fp}
T2 is {rho2 | e1: F1, ... en: Fn}
rho2 is (rho1 | ep: Fp)
---------------------
e1: T2
Enter fullscreen mode Exit fullscreen mode

The made-up syntax in the last premise is meant to say that rho2 has all the same fields (with the same types) as rho1 and additionally has field ep of type Fp

The above rule is useless because the conclusion is the same as the first premise: e1 has only one type since T1 and T2 are the same type. The type is just written differently in the second and third premises. When I wrote T1, I listed field ep explicitly, but when I wrote T2, field ep was included in the row variable rho2.

All this is to say that if one squints at the subsumption rule one might convince oneself it applies to row polymorphism, but one would be incorrect.

Note that the information loss with subtyping can be optional. In popular languages with subtyping, the information can be retained but requires more keyboarding. Here is a TS example of essentially treating subtyping like it's row polymorphism: function foo<A extends B>(a: A): A

The Perils of Information Loss

The information loss caused by the subsumption rule does not play well with type inference. Which doesn't worry me much, since type signatures are useful documentation).

A real-world bug I saw illustrates what I take to be a deeper issue with the information loss:

interface Animal {
    tasty: true
}

interface Cow extends Animal {
    moo: string
}

function getFromBarn(): Animal {
    const cow: Cow = {
        tasty: true,
        moo: "mooooo",
    }
    return cow
}

function speak(cow: Cow): void {
    console.log(cow.moo)
}

function main() {
    const cow = getFromBarn()
    speak(cow) // Error! Expected Cow but got Animal
}
Enter fullscreen mode Exit fullscreen mode

Try the code

Note: I sacrificed realism in trimming down the example

What makes this bug interesting to me is that the type checker complains, even though:

  • All the type annotations are correct
  • The code is correct

This stinks! There are several related issues:

  • As an advocate of static type checking, I think we should be rewarded for writing type annotations. But there is little payoff in this case, only more work.
  • To n00b eyes, the error message is confusing: the function expects a cow and we gave it a cow. And cows are animals, but it's complaining.
  • Even to expert eyes, it's not clear how to fix the code. Should we change the getFromBarn(), main(), or speak()? It depends on the intent of the authors of each of these three functions, and how we want the codebase to evolve.

I don't know of an easy way out of the information loss problem with subtyping. Some routes:

  • In this particular case, a really advanced compiler could point out the information loss in getFromBarn() and say that just changing the annotation will make the problem go away, if the dev is sure that's what they want. But this won't be feasible in general.
  • Tweak the subsumption rule so that information loss is opt-in only: users must explicitly say they are upcasting a Cow to an Animal. Which adds yet more work for the user.

One answer I don't think is right is "Use row polymorphism instead" as the techniques are not directly comparable:

  • Subtyping is more general than just records, it is also useful for working with, for example, literal types and union types.

  • The information loss with subtyping can sometimes be desirable in order to hide implementation details so we can evolve our APIs: for example, exposing something as an Iterable so we can switch from Set to Array in future. Maybe that's what the author of getFromBarn() had in mind.

💖 💪 🙅 🚩
maxheiber
Max Heiber

Posted on December 11, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related