The hardest thing I ever did explained as simply as possible.
DrBearhands
Posted on September 5, 2021
A while ago, I came across a situation where I needed to know things about some hidden state at compile time. Traditionally, this is done using Hoare logic. In Hoare logic, every statement has a precondition and a postcondition. That is, what must be true before the statement and what will be true after the statement. For the statement "set x
to 7
", the precondition is that x
exists, and the postcondition is that is x = 7
. We can chain multiple statements and get a final postcondition using the rules of Hoare logic.
Hoare logic is limited, which is one reason I strongly prefer (total) functional programming over imperative programming. Nevertheless, even functional programmers must deal with mutable state at some point. Generally we do this with a function of type
s -> s
where s
is the type of the state. That is, a function that takes as state as input, and produces a new state as output. A state-changing function may also return some value, so the actual type of the function is
s -> (a, s)
where a
is the type of the return value. In functional programming we use tuples to "return" multiple values. When we define how several state-changing functions can be chained together, they are called monads. You don't need to understand what that means exactly. "Monad" is just a mathematical name for a certain pattern.
It seemed what I wanted was Hoare logic over state monads. I came across a paper about the Hoare state monad. With the knowledge I had, this was one of the most difficult things I have ever implemented. I had already given up on Hoare state monads at least twice. This time I finally managed it after a full week. I feel that not sharing this journey would be a waste. Especially since they will probably turn out to be too complicated for the purpose I had in mind.
The code in this article is what I actually used to deconstruct the problem for myself. Plus a few extras for things that are only obvious to me.
Not every language can express Hoare state monad, we will need a sufficiently powerful type system. I was using Idris 2, so this was not a problem. Don't worry, I will go through any unusual concepts before using them.
First, let's make our own state monad. I'm going to use a hardcoded state, usually this would be variable.
data MyState = Foo | Bar
For those unfamiliar with sum-types: the above declares that a value of type MyState
can be either Foo
or Bar
.
We go on to create a state monad for MyState, you might be unfamiliar with the notation so I will explain it afterward.
data MyStateMonad : a -> Type where
MkMyStateMonad : (MyState -> (a, MyState)) -> MyStateMonad a
Note that a single colon indicates a type declaration in Idris. E.g. x : Int
means "x is an integer".
The first line, data MyStateMonad : a -> Type where
declares that the MyStateMonad
needs some a
as input to make a Type
. This is similar to arrays. In statically typed languages, we don't have just "an array", we have e.g. "an array of integers". We must specify what's "inside" it. Similarly we can have a state monad (think of it as a statement) that returns an integer MyStateMonad Int
, or a string MyStateMonad String
, or something else.
Below the data declaration are all the type's constructors, only MkMyStateMonad
in this case. The constructor has certain arguments, (MyState -> (a, MyState))
in this case. Recall state-changing functions are s->(a,s)
, with s=MyState
in this case. The constructor also has a return type, MyStateMonad a
, which must match the type given in the data declaration header. The a
is variable in this context, but every a
must match, so the return type of the function defines what the type of MyStateMonad a
is. E.g.: for a function f : MyState -> (Int, MyState)
, we know MkMyStateMonad f : MyStateMonad Int
.
But what about input? Idris is a curried language, meaning we can have the function:
someFunction : Int -> Int
someFunction = (+) 1
In other words, a function of type MyState -> (a, MyState)
can be a closure
of as many arguments as you want.
State monads are useful as is. They let us use do-notation rather than less readable pattern matching:
do
x <- actionA
actionB
actionC x
vs.
\s1 =>
let
(x, s2) = actionA s1
s3 = actionB s2
in
actionC x s3
this is not what we set out to do though, so let's talk about proofs in Idris.
You may have heard that types are propositions and values are proofs of propositions. The value 7
is proof that Int
s exist. 7
inhabits Int
. But we can do something a bit more interesting.
Remember how MyStateMonad
needed some a
? We can do something similar with some other type over a
, say P
, so that P a
only holds for some a
. We do this by restricting what constructors are available.
A good example is the Equal
type:
data Equal : a -> b -> Type where
Refl : Equal x x
Here, we can use the Refl
constructor to prove Equal x x
, but there is no way to get an Equal x y
value. Equal x y
is a valid type, but there is no value of that type unless x = y
. Therefore, if you can pass a value with type Equal x y
to a function, you know within that function that x = y
. There are more complex cases, of course, such as restricting inputs to members of a list.
In the above example, Equal
is a predicate, Equal x y
is a proposition, and Refl
is a proof of Equal x x
.
We can use proofs in a function without adding runtime overhead:
acceptsOnlyFoo : (s1 : MyState)
-> {auto 0 prf : Equal Foo s1}
-> (Int, MyState)
acceptsOnlyFoo inState = (7, Bar)
Let's dissect the function type line by line:
-
(s1 : MyState)
: this is similar toMyState
, but we have given the argument a name (s1
). In dependently typed programming, types can depend on values, so it's useful to have value names in the type. -
{auto 0 prf : Equal Foo s1}
: the curly brackets make this argument implicit. We don't have to specify it when calling the function, provided the compiler can figure it out itself. The0
means this argument will not exist at runtime.auto
denotes that the compiler will try to find a value for the type itself from the context.prf
is again the variable name andEqual Foo s1
is the type. In other words, we ask the compiler to find us some value of typeEqual Foo s1
, with information known at compile time only. The compiler can only succeed in this by findingRefl
if it knows for sure the first argument isFoo
. -
(Int, MyState)
the output type as you are already familiar with.
This means we can call acceptsOnlyFoo
as:
acceptsOnlyFoo Foo
but not
\x => acceptsOnlyFoo x
unless we already know Equal Foo x
at compile time some other way.
You might be able to see where this is going now. If we have one function that implicitly requires Equal Foo x
, and another that "returns" an Equal Foo x
, those could be safely chained together.
While acceptsOnlyFoo
specifically requires a proof Equal Foo
, we do not want to limit ourselves to just proving equality. We need a more generic type for the precondition. A precondition could be any MyState -> Type
. A state monad with a precondition is therefore:
data MyPreconditionMonad : (pre : MyState -> Type) -> a -> Type where
MkMyPreconditionMonad :
( ( s1 : MyState) ->
{ auto 0 prf_pre : pre s1 } ->
(a, MyState)
) ->
MyPreconditionMonad pre a
Notice { auto 0 prf_pre : pre s1 }
. The proof that must be found must be of type pre s1
. The initial state s1
is passed to it as argument at compile-time. Even though we don't really know what it will be. In our previous example pre
was Equal Foo
.
An example of how to construct the state monad with precondition:
testPrecondition1 : MyPreconditionMonad (Equal Foo) Int
testPrecondition1 = MkMyPreconditionMonad acceptsOnlyFoo
But what if we want to chain state monads that don't have a precondition? We create a Top
proposition, which can always be satisfied:
data Top : a -> Type where
MkTop : Top a
Whatever a
we have, MkTop
will be proof that Top a
.
The name Top comes from formal logic, where it is indicated with the symbol ⊤. It shouldn't actually take any arguments, but this servers our purpose well.
And this is how you might use it:
testPrecondition2 : MyPreconditionMonad Top Int
testPrecondition2 = MkMyPreconditionMonad f
where
f : (s1 : MyState) -> {auto 0 prf : Top s1} -> (Int, MyState)
f s1 = (7, s1)
So that about covers preconditions. Postconditions are a little bit harder.
As far as I know, Idris has no implicit results. Instead, we must return something like a tuple with an implicit argument.
To do this we create a special type PostS'
. I've added the apostrophe because the actual PostS
will be a little different.
data PostS' : (s : Type) -> (s -> Type) -> Type where
MkPostS' : (val : s)
-> {auto 0 prf : prop val}
-> PostS' s prop
PostS'
is parametrized by a Type s
and a predicate about s
. Note that we pass val
to prop
, not s
, since s
is the type, and we want the proposition to hold about the value. For a concrete example, we want to be able to say PostS' Int (Equal 7)
, not PostS' Int (Equal Int)
.
Now let's use PostS'
in a function that increases a counter by 1. The type of the counter will be Nat
, a built-in type for natural numbers:
testPostS' : (s1 : Nat) -> PostS' Nat (Equal (s1 + 1))
testPostS' x = MkPostS' (x+1)
As you can see, we don't need to pass a proof, Idris has figured out Equal (s1+1) (x+1)
automatically, by unifying s1 and x. More or less. I think. Honestly I'm not that knowledgeable about how Idris searches for proofs.
But the real PostS
needs to be a bit more complex. We don't just want the postcondition to be about the output state, it also needs to consider the returned value and input state. For instance we might want to declare that s2 = s1 + x
.
Luckily, if you understand the previous code, the next step is not particularly complicated:
data PostS : (a : Type) -> (s : Type) -> (a -> s -> Type) -> Type where
MkPostS : (val : a)
-> (state : s)
-> {auto 0 prf : prop val state}
-> PostS a s prop
A MkPostS
consists of a value, a state, and a proof that a proposition about said state and value holds (at compile time). You may have noticed that prop
takes no input state. That is because PostS
does not know about input state. If we want a predicate about input state, that input state must already be passed to it outside of PostS. For instance, if we want to express a resulting state is equal to the input state plus the returned values:
testPostS : (s1 : Nat) -> PostS Nat Nat (\n => Equal (s1 + n))
testPostS s1 = MkPostS 7 (s1+7)
With that out of the way, let's make a Hoare monad for a specific state. We will keep using Nat
now instead of MyState
, since there's not much to prove about a MyState
:
data MyHoareMonad : (pre : Nat -> Type) -> (a : Type) -> (post : Nat -> a -> Nat -> Type) -> Type where
MkMyHoareMonad :
( ( s1 : Nat) ->
{ auto 0 _ : pre s1 } ->
PostS a Nat (post s1)
) ->
MyHoareMonad pre a post
And we can use it as follows:
testMyHoareMonad : MyHoareMonad (Equal 0) Nat (\s1 => Equal)
testMyHoareMonad = MkMyHoareMonad f
where
f : (s1 : Nat) -> {auto 0 prf : Equal 0 s1} -> PostS Nat Nat (Equal)
f inState {prf} = MkPostS{prf=Refl} (inState+1) (inState+1) --{prf=Refl} is needed here because of a compiler bug.
Unfortunately there currently appears to be a bug in Idris' auto strategy. It means that, for now, we won't be able to rely on auto
as much. It's not a huge deal provided it will get fixed in the future, but a tad annoying.
But it's time for the real thing, make a Hoare state monad without hardcoded state type:
data HoareStateMonad : (s : Type) -> (pre : s -> Type) -> (a : Type) -> (post : s -> a -> s -> Type) -> Type where
MkHSMonad :
( ( s1 : s ) ->
{ auto 0 prf : pre s1 } ->
PostS a s (post s1)
) ->
HoareStateMonad s pre a post
it's not much different, we just replaced Nat
with a variable type s
.
Unfortunately, the hardest part is still to come. We want to use a Hoare state monad as a monad, or sort-of a monad. Technically we won't be making a monad. A monad requires the rule m m a -> m a
, but since our "m
" has been decorated with pre- and postconditions, every "m
" in that formula would be different. Doesn't really matter for us though, it's close enough.
We need to define a pure
and a >>=
(bind) function.
A pure
function lifts a value to a monadic value, like making a list with a single element, doing a no-op, or, in the case of a regular state monad, leave the state unchanged and return the lifted value:
pure : a -> MyStateMonad a
pure val = MkMyStateMonad ((,) val)
So what should the pre- and postconditions be for pure
? Since we're not using the state in any way, the precondition can just be Top
. Anything goes.
We could also use Top
as the postcondition, but that would lose some useful information. We know the input state is equal to the output state (Equal s1 s2
), and the output value is equal to the lifted value. We can use a tuple as a conjunction to indicate to predicates must both hold. After all, we must be able to provide both values of a tuple to construct it.
With that, pure
becomes:
pure : (a1 : a) -> HoareStateMonad s Top a (\s1, a2, s2 => (Equal s1 s2, Equal a1 a2))
pure a1 = MkHSMonad $ \s1 => MkPostS a1 s1
And now for the tricky bit, the bind operator.
For regular state monads, the bind operator chains the state-changing functions together:
(>>=) : MyStateMonad a -> (a -> MyStateMonad b) -> MyStateMonad b
(>>=) (MkMyStateMonad f1) act2 = MkMyStateMonad $ \s1 =>
let
(val1, s2) = f1 s1
(MkMyStateMonad f2) act2 val1
in
f2 s2
So what we're really doing, is first executing action A, the executing action B, potentially using A's output. I will be using a notation from program algebra for clarity: A ; B
means "first do A, then do B".
But, again, what should the pre- and postconditions be? Let's first look at the part of bind's type signature we do know:
(>>=) : HoareStateMonad s p1 a q1
-> ((x : a) -> HoareStateMonad s (p2 x) b (q2 x))
-> HoareStateMonad s ?p3 b ?q3
Here, s
is the state type, p1
is the precondition of action A, a
is the return type of action A, q1
is the postcondition of action A, and x
is the input to action B. p2
and q2
are a bit different than p1
and q1
. They are applied not just over input state, return value and output state, but also over the input value x
. Why? Consider the following function:
trySetState : (x : Nat) -> (s1 : Nat) -> (Bool, Nat)
trySetState x s1 =
if (someCondition)
then (True, x)
else (False, s1)
The function sets the state based on some condition. This is a common pattern. A reasonable postcondition for trySetState
would express that if the return type is True
, then Equal x s2
is proven, and if it is False
, then Equal s1 s2
is proven. Declaring this proposition requires using x
as input.
As the example shows, it is useful to also consider inputs in the pre- and postconidtion for the second argument of bind
.
Back to the bind operation. What is p3
, the precondition of the "first A then B"? Since A executes first, p3
must contain p1
. Furthermore, to satisfy p2
, p3
must ensure that p2
holds after A has executed, so the postcondition q1
of A must imply the precondition p2
of B. Remember also that the type of p3
is s -> Type
.
We define a function bind_pre
that constructs a new precondition from p1
, q1
and p2
:
bind_pre
: {s : Type}
-> {a : Type}
-> (p1 : s -> Type)
-> (q1 : s -> a -> s -> Type)
-> (p2 : a -> s -> Type)
-> s -> Type
bind_pre p1 q1 p2 s1 =
( p1 s1
, {x : a} -> {s2 : s} -> q1 s1 x s2 -> p2 x s2
)
This isn't easy, so let's go through it line by line:
-
bind_pre
: the function name -
: {s : Type}
: the type declaration for an implicit argument. You can read this as "for anys
that is a type" -
-> {a : Type}
: Read as "for anya
that is a type" -
-> (p1 : s -> Type)
: first argument to the function (not counting implicits). A predicate overs
. The precondition of the first action. -
-> (q1 : s -> a -> s -> Type)
: Postcondition of the first action -
-> (p2 : a -> s -> Type)
: precondition of the second action. Based on the output of the first action. -
-> s -> Type
: The return type. Rememberp3
must bes -> Type
. Idris is curried tox -> s -> Type
andx -> (s -> Type)
are exactly the same thing. -
bind_pre p1 q1 p2 s1 =
: begin of function body. We give the arguments new names here, that is because the names in the type declaration are only valid within that type declaration. I used matching names for clarity. -
( p1 s1
: The result is a tuple, with as first value the regular precondition of the first action. -
, {x : a} -> {s2 : s} -> q1 s1 x s2 -> p2 x s2
: for anyx
of typea
, and anys2
of types
, if we can proveq1 s1 x s2
, then we must also be able to provep2 x s2
.
Well, that was hard enough, but the postcondition is worse. q1
and q2
are qualified on intermediate results. These intermediate values won't appear in the resulting monad's type declaration. We only know that they exist and that the postconditions hold for them. We're going to need a function for existential quantification:
exists : {p : _} -> (p -> Type) -> Type
exists f = {b : Type} -> ({a : _} -> f a -> b) -> b
What we did here, is reverse universal quantification to get existential qualification. {a : Type} -> a -> Type
is a predicate that works for all a
s that are Type
s. Top
is such a predicate. Conversely, we want to state that some a
exists that satisfies a certain predicate. There is a duality here: if all you know is that something exists, you can only reason about it using predicates that work for all values. In programming, that means an existentially quantified ouput is mathematically the same as a universally quantified continuation. In other words: exists x. x
is the same as forall y . (forall x . x -> y) -> y
. The latter can be read as "you can prove any statement of type y
so long as you can prove it using any statement of type x
". Since we know nothing about y
, only that it can be proven using any x
, this tells us that some x
must exist. The other way works too: if we know some x
exists, we know that a y
can be proven if it can be proven for any x
.
Now finally we can get to the postcondition of A; B
.
Remember that q1
and q2
are qualified over intermediate values, of which we only know they exist in the type signature. We therefore have:
bind_post
: {s : Type}
-> {a : Type}
-> {b : Type}
-> (q1 : s -> a -> s -> Type)
-> (q2 : a -> s -> b -> s -> Type)
-> (s -> b -> s -> Type)
bind_post q1 q2 s1 y s3
= exists (\x => exists (\s2 => (q1 s1 x s2, q2 x s2 y s3)))
Which leaves the final declaration of bind, I'll explain it line-by-line later:
(>>=) : {a : Type}
-> {b : Type}
-> {s : Type}
-> {p1 : s -> Type}
-> {q1 : s -> a -> s -> Type}
-> {p2 : a -> s -> Type}
-> {q2 : a -> s -> b -> s -> Type}
-> HoareStateMonad s p1 a q1
-> ((x : a) -> HoareStateMonad s (p2 x) b (q2 x))
-> HoareStateMonad s (bind_pre p1 q1 p2) b (bind_post q1 q2)
(>>=) (MkHSMonad f1) act2 = MkHSMonad f
where
f : ( s1 : s )
-> { auto 0 pre3 : bind_pre p1 q1 p2 s1 }
-> PostS b s (bind_post q1 q2 s1)
f s1 {pre3} =
let
MkPostS v1 s2 {prf=post1} = f1 s1
MkHSMonad f2 = act2 v1
MkPostS v2 s3 {prf=post2} = f2{prf= (snd pre3) post1} s2
in MkPostS v2 s3 {prf=(\e1 => (e1 $ \e2 => e2 (post1, post2)))}
-
{a : Type}
: implicit argument. Read as "for alla
s that are types" -
{b : Type}
: for allb
s that are types -
{s : Type}
: for alls
s that are types -
{p1 : s -> Type}
: for all predicatesp1
about ans
(this is the type of our first precondition) -
{q1 : s -> a -> s -> Type}
for all predicatesq1
about ans
,a
and anothers
(the type of our first postcondition) -
{p2 : a -> s -> Type}
: for all predicatesp2
overa
ands
(second precondition, extra qualified over the input) -
{q2 : a -> s -> b -> s -> Type}
: second postcondition -
HoareStateMonad s p1 a q1
: the first argument, aHoareStateMonad
about states
, with preconditionp1
, return valuea
, and postconditionq1
. -
((x : a) -> HoareStateMonad s (p2 x) b (q2 x))
: the second argument, a function that takes ana
(with valuex
), and yields aHoareStateMonad
about states
, with preconditionp2 x
, returning ab
, and with postconditionq2 x
-
HoareStateMonad s (bind_pre p1 q1 p2) b (bind_post q1 q2)
: results in aHoareStateMonad
about states
, with preconditionbind_pre p1 q1 p2
, returning a valueb
, and with postconditionbind_post q1 q2
And for the function body:
-
(>>=) (MkHSMonad f1) act2 = MkHSMonad f
: unwrapping and wrapping the state-changing functions in monad constructors.f
is the new state-changing function we create as the sequence of our inputs. -
f : ( s1 : s )
: the first argument to the new function is the input state -
-> { auto 0 pre3 : bind_pre p1 q1 p2 s1 }
: we also require a proof that the combined preconditions hold -
-> PostS b s (bind_post q1 q2 s1)
: and produce a value with suitable postconditions -
f s1 {pre3} =
: we note the precondition proofpre3
explicitely as we will need to use it manually in the function body. -
MkPostS v1 s2 {prf=post1} = f1 s1
: "Run" the first action with the input state. This will result in a return valuev1
, a new states2
and an proof of the postconditionpost1
. Note thatpost1
still has multiplicity 0 so it will not exist at runtime. -
MkHSMonad f2 = act2 v1
: pass the return value of the first action, this will result in a new state-changing functionf2
, wrapped in a HoareStateMonad. -
MkPostS v2 s3 {prf=post2} = f2{prf= (snd pre3) post1} s2
: we "run"f2
with states2
, but must also prove its precondition. The preconditionpre3
is( p1 s1 , {x : a} -> {s2 : s} -> q1 s1 x s2 -> p2 x s2 )
The second argument is of interest here, and we haveq1 s1 v1 s2
frompost1
, and so(snd pre3) post1
is proof ofp2 v1 s2
. -
MkPostS v2 s3 {prf=(\e1 => (e1 $ \e2 => e2 (post1, post2)))}
: we create a return value with postcondition, but must provide proof of our new postcondition. Recall existential types are continuations, and so we create a continuation as proof.
Next steps
I still have to use HoareStateMonads
in practice. I planned to use them to reduce what mistakes can be made when writing a WebGL program. Those can be very tricky to debug. I fear HoareStateMonads
require passing around proofs far too much to be useful in most cases. The extra programming work might not weigh up against the reduction of bugs. But perhaps Idris will be able to find many proofs by itself. Or perhaps I will need a different data type entirely.
Posted on September 5, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.