The Strange Art of Code Reuse, Part 2
Sean Williams
Posted on July 3, 2022
Fundamentally, the problem of code reuse is this: when you have structured data—collections, structs, objects, discriminated unions, whatever—and you have a function that takes structured data as an argument, which structured types are allowed?
Part one, besides talking about what this looks like in dynamic typing, also revealed some of the intuition behind this question. It really comes down to what your function is going to do with your data. Part three will formalize this idea—it turns out that you can pose this as a logic problem, which was demonstrated way back in 1969—but we're not quite there yet.
In dynamic typing, "what you can do with an object" really comes down to "what methods are defined on that object." Dynamically typed languages are generally based on the idea that objects are dictionaries. A "length" method on an object is just something like,
obj.length = function() { ... }
though doing this correctly requires using a constructor to do a lexical closure on the object self-reference, as in,
function make_obj() {
obj = { ... };
obj.length() = { obj.something_or_other };
return obj;
}
These languages then allow you to do dictionary lookup with "dot notation," where obj.length()
is equivalent to obj["length"]()
.
Anyway, my criticism of this approach is that two methods having the same name does not imply they have the same semantics. It's worth throwing in an important caveat, though: computers are not semantic machines. The only thing you can do to improve programming is to have more disciplined programmers, but language design has a role to play in instilling or "normalizing" discipline. So really my problem with dynamically typed languages is that they encourage sloppiness.
Anyway, we're done with dynamic typing. Now that we're into static typing, it's time to talk about the most popular form of code reuse.
Hot Dogs and Other Sandwiches
While the base requirement for code reuse is that we have a shared set of operations we can do with our function arguments, what we really want is shared semantics. In a sense, this is a philosophical problem. The most popular answer, both in CS and out in the world generally, is taxonomy.
Taxonomy is where you create conceptual parent-child relationships between categories. Scientists have created elaborate taxonomies for all sorts of things in the natural world: the kingdom-phylum-class-order-family-genus-species categorization of life, for example, or the Berzelian classification system for minerals.
Of course, I'm talking about object-oriented programming. The core feature of OOP is inheritance, which says that a class is a more specialized version of another class. If you have parent class A
and child class B
, then B
must provide an implementation of all properties and methods on A
—and it can provide some more methods and properties if it wants to. Of course, if A
is a concrete class, B
starts out with default implementations for everything on A
, namely, the implementations provided on A
.
Because of this, if a function expects an argument of type A
, then it can also take an argument of type B
. Taking a type A
argument means that you'll only be able to access properties and methods specified on A
, which as I said, B
must also provide by virtue of being a child.
The example problem in the first part is in one sense easy to solve in OOP, but in the real world usually ends up requiring some bizarre hacks.
abstract class Collection<'t> {
public abstract void Add('t);
}
public class List<'t> : Collection<'t> {
// implementation details
public void Add(element) {
// implementation details
}
}
public class Set<'t> : Collection<'t> {
// implementation details
public void Add(element) {
// implementation details
}
}
Now a function can take a Collection
, and it's guaranteed that you can Add
an element to it. If you pass a List
, it'll put it on the end; if you pass a Set
, it'll add it if it isn't already present. Under the hood this is called "dynamic dispatch." Objects have tables of function pointers to their specific methods, so method invocation involves looking up the correct function pointer in the actual object being operated on and calling that.
How do we do dictionaries? Probably the most sensible answer is to create a helper class:
public class KeyValuePair<'k,'v> {
private 'k _key;
private 'v _value;
public KeyValuePair('k key, 'v value) {
_key = key;
_value = value;
}
public 'k Key {
get { return _key; }
}
private 'v Value {
get { return _value; }
}
}
public class Dictionary<'k,'v> : Collection<KeyValuePair<'k,'v>> {
// implementation details
}
So far so good, but there's a big problem still. Let's consider Windows Presentation Foundation, one of Microsoft's many application libraries. There's a class for text boxes, System.Windows.Controls.TextBox
, and there's a lightweight rich text label, System.Windows.Controls.TextBlock
. TextBox
is a Control
, since it's used to provide input, while TextBlock
is not.
Both classes have a property for controlling word wrapping, TextWrapping
. Even though both classes have the same property of the same type (System.Windows.TextWrapping
, an enumeration of word wrapping styles), it isn't derived from a base class. The parent of TextBlock
is FrameworkElement
, which covers anything that can be laid out and rendered, so it would be inappropriate for it to define TextWrapping
—that would put a spurious property on all sorts of classes, most of which don't have any text at all!
Instead, the two classes independently define their own TextWrapping
property. This means that, at first blush, it's impossible to write a function SetTextWrapping
that works on both TextBox
and TextBlock
.
The obvious solution to this problem is allowed in C++, and disallowed in basically every object-oriented language since:
abstract class TextElement {
private System.Windows.TextWrapping _wrap_mode;
// other implementation details
}
public class TextBlock : FrameworkElement, TextElement {
// implementation details
}
This is called multiple inheritance, and it's verboten because it's complicated. The best demonstration of the problem is called "diamond inheritance":
#include <iostream>
class A {
public:
virtual void foo();
};
class B: public A {
public:
B() { }
void foo() {
std::cout << "Called foo on B" << std::endl;
}
};
class C: public A {
public:
C() { }
void foo() {
std::cout << "Called foo on C" << std::endl;
}
};
class D: public B, public C {
public:
D() { }
};
int main() {
D* d = new D();
d->foo();
return 0;
}
What does this program print? Nowadays you get this compiler error:
member 'foo' found in multiple base classes of different types
but most languages just don't allow multiple inheritance in the first place. Back when compilers allowed this, I honestly can't remember what it would have printed. Indeed, embedding multiple inheritance into a GUI library would be a maintainability disaster.
Interfaces allow multiple implementation, which is fine because interfaces don't provide any implementations themselves. This means the interface inheritance scheme will never create ambiguities: interface implementations will only ever be inherited through the single-inheritance class hierarchy.
So why aren't there interfaces for these orphaned properties in WPF? I don't have an answer. Maybe some Microsoft engineer out there could answer it, but I don't imagine I'd find the answer very satisfying.
This also gets us back around to the problem of a generic add-to-collection function. Arguably the biggest weakness of object-oriented programming, as it's done today, is that interfaces have to be implemented as part of the class definition. So what if we wanted to allow, say, Dotnet's System.Collections.Generic.List<'t>
into our scheme?
We can't go back and edit Dotnet's definition of List<'t>
, so the only real answer is inheritance. This means that our Collection<'t>
class has to become an interface. Now if someone takes our code and wants to extend the functionality further, they have to go through the same trouble: define their functionality as an interface, and inherit our List<'t>
to tack on their interface implementation.
To summarize object-oriented programming, then...
We do get some semantic discipline, since it pushes programmers to think taxonomically. We have reasonable assurance that Collection.Add
has consistent meaning, since a sane person isn't going to stick a non-collection class in the Collection
interface. It also makes the intentions of our arguments clearer, for the same reasons: type names, at the very least, clue programmers in to our intended semantics.
The problem, and this is why the fictional Carnap had so much trouble classifying sandwiches and hot dogs, is that the world is not taxonomical. Any taxonomy is inherently inadequate, which does two things to class hierarchies: first, they get bloated, and second, you end up having to hack around their limitations. So we get deranged stuff like the "composition pattern," and needless inheritance, and "extension methods," and the like.
In the next and final episode I'll talk about Haskell. I'm no theory zealot, and I don't think type classes are the end all and be all, but I do think they're a huge step up from all of this.
Posted on July 3, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.