Boolean coercion pitfalls (with examples)
Mike Samuel
Posted on February 27, 2023
It's not uncommon for programming language designers to let people write conditions where the type is not boolean, by automatically coercing non-boolean, but truthy values:
┏━━━━━━━━━ not boolean
▼
if (x) {
body(x);
} ▲
┗━━━━ known to be usable here
This lets developers run body
only if x
is a kind of value that you can do things with.
It's tempting, as a language designer, when you see developers writing the same kinds of test conditions over and over to want to simplify things.
-if (pointer != null) { ... }
+if (pointer) { ... }
-if (!myList.empty) { ... }
+if (myList) { ... }
-if (result.isSuccess) { ... }
+if (result) { ... }
-if (inputText != "") { ... }
+if (inputText) { ... }
-if (count != 0) { ... }
+if (count) { ... }
All else being equal, a language that encourages shorter code is better.
But, the principle of least surprise says we ought craft semantics that have a clear meaning, even when developers are fuzzy on the precise semantics of a phrase.
Below are examples of different ways that silent conversion, specifically of if
and loop condition results to boolean values, can violate that principle.
Special "nothing here" values like NULL
are exactly the kind of things we might want to filter out. So maybe these two should be equivalent.
// For any pointer, p
if (p != NULL) { ... }
if (p) { ... }
// In C++, for example, NULL is falsey,
// and all other pointer values are truthy.
But it's easy to confuse containers with their content. Developers may be surprised when the container and content coerce to different truth values.
// What if we're pointing to a boolean.
bool b = false;
bool* p = &b;
if (p) {
std::cout <<
"a non-null pointer to false is truthy\n";
}
In languages that don't have null pointers, Option and Result types often serve a similar function: a box of zero or one value.
(* This is not actually valid OCaml
* since OCaml doesn't do coercion.
*
* But imagine Some _ is truthy and
* None is falsey
*)
let x = Some(false) in
if (x) then
...
Note that Swift allows its optional types in a condition but only via a binding condition which makes it clear that the container is what is being checked.
// Swift
let x: Bool? = false
// When x is not `nil`, unpacks it into b
if let b = x {
print("b is \(b)")
}
// if x { ... } // Does not type-check
Consider programs that work with data languages like JSON and YAML that use syntax typing; the textual value determines its type.
For example, YAML, a common configuration language, treats many un-quoted strings as booleans:
y|Y|yes|Yes|YES|n|N|no|No|NO |true|True|TRUE|false|False|FALSE |on|On|ON|off|Off|OFF
Developers who work with these languages are used to thinking of words as having truth values.
There's potential for confusion when a programming language assigns different boolean valence to a string value than the language of the string's content.
This happens in many programming languages: if (someContainer)
executes the body if someContainer
is not empty.
For example, Python says:
Any object can be tested for truth value, for use in an
if
orwhile
condition, or …By default, an object is considered true unless … Here are most of the built-in objects considered false:
- …
- empty sequences and collections:
''
,()
,[]
,{}
,set()
,range(0)
So Python treats the string ''
as falsey but others are truthy, including strings like 'false'
and 'no'
.
Here's some JavaScript, which has similar string truthiness.
// Unpack a value in a data language
// that the program does not control.
let request = JSON.parse(
`
{
"deleteAccount": "no",
"userConfirmationCheckbox": "off"
}
`
);
// But we forgot to convert those fields
// to booleans using the request language's
// conventions.
// So JavaScript's conventions prevail.
if (
request.deleteAccount &&
request.userConfirmationCheckbox
) {
performIrreversibleDelete();
}
A programming language should support developers in checking assumptions about values that come from outside the program. Assigning arbitrary truthiness to string values makes it harder to find and check these assumptions.
It's easy to confuse a zero-argument function with its result.
When first-class function values have a boolean sense, this can lead to confusion.
# Python
class Animal:
def is_goldfish_compatible(self):
return True
class Goldfish(Animal): pass
class Dog(Animal): pass
class Cat(Animal):
# overrides a predicate from SuperType
def is_goldfish_compatible(self):
return False
goldfish = Goldfish()
cat = Cat()
dog = Dog()
def play_together(a, b):
print("%r plays with %r" % (a, b))
for animal in [dog, cat]:
if animal.is_goldfish_compatible :
# ▲
# Pay attention here ━━━━━━━┛
play_together(animal, goldfish)
It's easy to confuse the method call:
animal.is_goldfish_compatible()
with a read of a bound method:
animal.is_goldfish_compatible
Especially since other classes define similarly named is_
* boolean attributes.
Unit
and void
are common names for special values produced by functions that are meant to be called for their side effect.
They make for clearer code. If a function's author doesn't intend to provide a value to the caller, they can simply not return
anything.
These special values should probably not be silently treated as having a boolean valence, but in some languages they are.
This JavaScript is fine.
// A lambda that returns true
let shouldDoTheVeryImportantThing =
() => true;
if (shouldDoTheVeryImportantThing()) {
doIt();
}
That lambda returns true
when called, but maybe I'm debugging the program, so I add some logging. I need a block to put a logging statement in, so I wrap it in {...}
.
let shouldDoTheVeryImportantThing =
() => {
console.log("Called should...");
true
};
if (shouldDoTheVeryImportantThing()) {
doIt();
}
When I added a block around the lambda body, I forgot to add a return
before the true
. Now it returns the special void-like value undefined
, which coerces to false
in a condition.
The second version logs, but silently fails to call doIt()
.
Automatic coercion results from a genuine desire by language designers to help developers craft more succinct and readable programs.
But when the semantics are not carefully tailored, this can lead to confusion.
Be especially careful around:
- generic boxes and collections-of-one that might wrap booleans, including pointer types, Option and Result types,
-
catch-all types like
string
andbyte[]
that may layer semantics in another language which is at odds with the coerced semantics, - producers of values like first-class function values, thunks, and promises which might have a different boolean valence from the value they produce, and
- special placeholder values like
void
orundefined
which, when coerced to booleans silently, mask a failure to account for a missing value.
Before designing coercion semantics, maybe ask yourself, could a proposed coercion semantics mask missing steps that the author should have performed?
Might a different mechanism (like Swift's binding condition above) suffice?
Maybe you think that the rules you've chosen are obvious and that developers will have them clearly in mind when reading & writing conditions. "On the arbitrariness of truthi(ness)" explains how different PL designers, who probably each thought that their notions of truthiness were clear, came to very different decisions.
Thanks for reading and happy language designing.
Posted on February 27, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.