DrBearhands
Posted on November 12, 2019
Post from my blog. If you're enjoying the series, please consider buying me a coffee.
It is my personal belief that about 1/5 of the time, preferably the Monday of a full-time week of work, should be spent contemplating the work done so far and doing cleanup (refactoring, issues, etc.). Ignoring the setup chapter, this would be the fifth chapter and thus we shall do some retrospective and refactoring.
What have we done?
We've taken 5 chapters to make a basic server that just fetches data from a database and turns it into a simple HTML string. This sounds like little result for a lot of effort. However, we've also written an infinite number of tests.
In a purely functional, strongly typed language, the types correspond to proofs (we call this the Curry-Howard correspondence). A few examples:
myNumber :: Int
myNumber = 7
The type, myNumber :: Int
, is saying "we have a proof that an Int
exists, and that proof is called myNumber
". We later give this proof in the term myNumber = 7
. This proof doesn't seem really useful, and not something that would replace any tests for sure. So let's go further:
sum :: [Int] -> Int
sum = foldr (+) 0
This time sum :: [Int] -> Int
tells us there is a proof called sum
that can be used to get an Int
from a list of Int
s. We don't have to test that it returns an Int
, we have proof, which is better! If we had to test this behavior, without types or inspecting the body / term of the function, we would theoretically have to test with every possible list of Int
s, since one of them might yield a value of a different type. Far-fetched in this example, yes, but it becomes more relevant as functions become more complex.
We would also have to test how it works under incorrect input, since it might fail silently, crash the program or do something else unexpected.
mySumOfStrings = sum ["Foo", "Bar"] -- Bad! Won't compile
The above "proof" is invalid and won't compile, much less run. There is no need to write any tests for such cases.
There's another cool thing you can do with types using Abstract Data Types.
When writing a module, you can simply choose not to export a type's constructors. That way, any value of that type must come from functions you have exposed. We've seen this trick with the ResponseReceived
type, where it ensures we don't accidentally forget to respond. We could use it in our tutorial project too, for instance by hiding the constructors for Task
, so that all tasks must be retrieved from the database. Alternatively, we could define a SortedList
which is implemented like a regular list, but with a hidden constructor, so we can only get one using sorting functions, and cannot use functions that would break sort-order, like map
.
Basically, we can use Abstract Data Types for better encapsulation.
Because types put restrictions to what can go in and come out of a function, they fit together like Lego bricks. This makes working on projects that have grown beyond the mundane much more enjoyable.
Cleaning up our work so far
An often-cited advantage of purely functional programming is the ease of refactoring code. This makes sense if you realize function terms are proofs. A proof is valid wherever and whenever we use it! There is no context (global state) to our proofs that might change when refactoring.
Down to business!
Our code is a mess. We've put everything in the Lib
module, are only exposing someFunc
and we've changed none of the default names...
First, let's rename the Lib
module. We'll need to change the file name, the module name declaration and any imports (in app/Main.hs
in our case). We'll call it ServerMain
. If we miss anything, compilation will fail, so the compiler has our back again. Let's also change someFunc
to startServer
.
app/Main.hs
module Main where
import ServerMain (startServer)
main :: IO ()
main = startServer
src/ServerMain.hs
{-# LANGUAGE OverloadedStrings #-}
module ServerMain
( startServer
) where
-- [Some import statements]
startServer :: IO ()
startServer = do
run 8080 requestHandler
-- [rest of the module]
Do a stack build
to check if you missed anything.
Second, we'll want to declare the HTML
typeclass and any functions related to it in their own module, that way they can be used by other modules than ServerMain
. Create a new file src/HTML.hs
:
src/HTML.hs
module HTML where
class HTML a where
toHTML :: a -> String
instance HTML a => HTML [a] where
toHTML listOfElements =
let
elementToListElement :: HTML a => a -> String
elementToListElement element = "<li>" ++ toHTML element ++ "</li>"
in
"<ul>" ++ (concat $ map elementToListElement listOfElements) ++ "</ul>"
toHTMLPage :: HTML a => a -> String
toHTMLPage a =
"<!DOCTYPE html>\
\<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"\" xml:lang=\"\">\
\<head>\
\ <meta charset=\"utf-8\" />\
\ <title>TODO list </title>\
\</head>\
\<body>\
\ " ++ toHTML a ++
"</body>"
Notice that we didn't specify any exports, that means everything is exported. This is fine for certain kind of modules, like this one which only has utility functions.
We also didn't put the instance declaration for Task
in there. We couldn't, because we don't know about Task
in this module. Thanks to the open world assumption, we can declare Task
to be an instance of the HTML
class elsewhere.
We import the HTML
module in ServerMain
:
import HTML (HTML, toHTML, toHTMLPage)
And stack build
to ensure we didn't forget anything.
Third, we should probably move the database operations to its own module. We'll also implement the suggestion from the last section; hiding the constructors of the Task
types.
src/TaskDB.hs
{-# LANGUAGE OverloadedStrings #-}
module TaskDB
( TaskStatus(Done, NotDone)
, Task
, fetchFromDB
) where
import HTML (HTML, toHTML)
import qualified Hasql.Connection
import qualified Hasql.Session
import qualified Hasql.Statement
import qualified Hasql.Encoders
import qualified Hasql.Decoders
import qualified Data.Text
import qualified Data.ByteString.UTF8 as StrictUTF8 (fromString, toString)
data TaskStatus = Done | NotDone
data Task = Task String TaskStatus
instance HTML Task where
toHTML task =
case task of
Task description status ->
case status of
NotDone -> "<p>" ++ description ++ "</p>"
Done -> "<p><strike>" ++ description ++ "</strike></p>"
fetchFromDB :: IO [Task]
fetchFromDB =
let
connectionSettings :: Hasql.Connection.Settings
connectionSettings =
Hasql.Connection.settings
"localhost"
(fromInteger 5432)
"Haskell-student"
"Why-are-you-putting-credentials-in-code?-You-absolute-potato!"
"todolists"
in do
connectionResult <- Hasql.Connection.acquire connectionSettings
case connectionResult of
Left (Just errMsg) -> error $ StrictUTF8.toString errMsg
Left Nothing -> error "Unspecified connection error"
Right connection -> do
queryResult <- Hasql.Session.run selectTasksSession connection
Hasql.Connection.release connection
case queryResult of
Right result -> return result
Left err -> error $ show err
selectTasksSession :: Hasql.Session.Session [Task]
selectTasksSession = Hasql.Session.statement () selectTasksStatement
selectTasksStatement :: Hasql.Statement.Statement () [Task]
selectTasksStatement =
Hasql.Statement.Statement
"SELECT * FROM todolist_ch5"
Hasql.Encoders.unit
tasksDecoder
True
tasksDecoder :: Hasql.Decoders.Result [Task]
tasksDecoder = Hasql.Decoders.rowList taskDecoder
boolToTaskStatus :: Bool -> TaskStatus
boolToTaskStatus True = Done
boolToTaskStatus False = NotDone
taskStatusDecoder :: Hasql.Decoders.Value TaskStatus
taskStatusDecoder = fmap boolToTaskStatus Hasql.Decoders.bool
stringDecoder :: Hasql.Decoders.Value String
stringDecoder = fmap Data.Text.unpack Hasql.Decoders.text
stringDecoder_row :: Hasql.Decoders.Row String
stringDecoder_row = Hasql.Decoders.column stringDecoder
taskStatusDecoder_row :: Hasql.Decoders.Row TaskStatus
taskStatusDecoder_row = Hasql.Decoders.column taskStatusDecoder
taskDecoder :: Hasql.Decoders.Row Task
taskDecoder = Task <$> stringDecoder_row <*> taskStatusDecoder_row
The module declaration is the only new thing here:
module TaskDB
( TaskStatus(Done, NotDone)
, Task
, fetchFromDB
) where
This specifies exactly what is exported: TaskStates
with its constructors Done
and NotDone
, Task
with no constructors, and fetchFromDB
, the only way with which we can get a Task
outside our module at all. Instance declarations of an exported type are exported automatically.
Optionally, we can rewrite taskDecoder
, taskStatusDecoder_row
and stringDecoder_row
using do
-notation for applicatives:
taskDecoder :: Hasql.Decoders.Row Task
taskDecoder = do
taskDescription <- Hasql.Decoders.column stringDecoder
taskStatus <- Hasql.Decoders.column taskStatusDecoder
return $ Task taskDescription taskStatus
This is functionally the same as our previous code, it's up to you to decide which notation you prefer.
This leaves our ServerMain
module short and sweet. But there's one more thing I'd like to change, I don't like having that naked run
there, it's not descriptive enough. So we modify the Warp import declaration a bit:
src/ServerMain.hs
{-# LANGUAGE OverloadedStrings #-}
module ServerMain
( startServer
) where
import HTML (HTML, toHTML, toHTMLPage)
import TaskDB (Task)
import qualified TaskDB
import qualified Network.Wai.Handler.Warp as Warp (run)
import Network.Wai (Request, Response, ResponseReceived, responseLBS)
import Network.HTTP.Types.Status (status200)
import qualified Data.ByteString.Lazy.UTF8 as UTF8 (fromString)
startServer :: IO ()
startServer = do
Warp.run 8080 requestHandler
requestHandler :: Request -> (Response -> IO ResponseReceived) -> IO ResponseReceived
requestHandler request respond =
let
htmlPage htmlAble = UTF8.fromString $ toHTMLPage htmlAble
response tasks = responseLBS status200 [] $ htmlPage tasks
in
do
taskList <- TaskDB.fetchFromDB
respond $ response taskList
There's one more thing to do. Haskell has certain native typeclasses for which it can generate an instance automatically. Specifically, these are:
-
Eq
defines equality of values (implementing==
) -
Ord
defines an order of values through (implementing<=
) -
Show
turns a value to a string based on its constructors -
Read
, the inverse of Show -
Enum
, lets the type act as an enumeration of potential values -
Bounded
, which quite frankly I've never used -
Functor
, which derives thefmap :: (a -> b) -> f a -> f b
function, provided you have the right language extension. Only works for types with kind* -> *
.
You generally wouldn't want to write instances for these classes by hand, except for some niche cases, since there are obvious implementations for each. We can use the deriving
functionality in Haskell:
data TaskStatus = Done | NotDone
deriving (Eq, Ord, Show, Read, Enum)
data Task = Task String TaskStatus
deriving (Eq, Ord, Show)
Note that we do not derive Read
for Task
, as we didn't want to export any way to create a Task
beyond retrieving one from the database.
Deriving these instances allows us to compare values, print them and read them. I derive Eq
, Ord
and Show
for almost all my types, the other typeclasses I almost never derive.
There is more cleanup we could do, like reading port number, DB connection settings, etc. from environment settings or the command line, but those are rather environment specific, will complicate local testing a bit, and there's some discussion to be had about it, so we won't do that now.
Limits of type-based proofs
We've seen how types can be used to build proofs and how that makes refactoring easy. While this is a hell of a lot better than what we can do in most other languages, there are limits to the proofs we can build in Haskell that you should be aware of.
Obviously, although we can prove that sum :: [Int] -> Int
returns an Int
, we can't prove that the result will actually be the sum of the input list. We could have defined sum
as sum _ = 7
! Dependently-typed languages can relate proofs to actual values, but even in that case, we would have to define what a sum is to verify that the result of sum
is an actual sum. At some point we need axioms. This is a fundamental limitation of mathematics.
Then there's the IO
monad. Once we allow effects in functions, proofs become a lot weaker, since we don't know what might be happening with context. If you have a lot of communication and little computation, Haskell tends to degrade to the level of type-safety offered by strongly-typed imperative languages.
Finally, and this is a missed opportunity IMHO, Haskell proofs have terms and conditions attached to them, stemming from Haskell being in the Hask category rather than in Set. In Haskell, a -> b
does not mean that we will definitely get an inhabitant of b
from an inhabitant of a
, as we would in the Set category / total functional programming. Haskell's functions are partial, meaning they may not produce a b
for every a
.
In part, this is by necessity. Because of the halting problem. We cannot decide for every Turing complete program if it terminates (having a function that can do this would lead to a contradiction). That means a function f :: a -> b
, when given an a
, might just never terminate. In that case, we say the function returns a bottom, denoted as โฅ. Let's take a look at how this would look, first we make a valid proof:
slowMultiplication :: Int -> Int -> Int
slowMultiplication n 0 = 0
slowMultiplication n m = n + slowMultiplication n (m-1)
This kind of recursion is equivalent to a proof by induction. We prove something for the base cases (slowMultiplication x 0 = 0
), then prove it for the general case (slowMultiplication n m
) given that it's true for the predecessor (slowMultiplication n (m-1)
). The same things that make proofs by induction invalid can also cause non-termination in recursion: no base case and not using the right predecessor.
slowMultiplication_bad1 :: Int -> Int -> Int
slowMultiplication_bad1 n m = n + slowMultiplication_bad1 n (m-1)
slowMultiplication_bad2 :: Int -> Int -> Int
slowMultiplication_bad2 n 0 = 0
slowMultiplication_bad2 n m = n + n + slowMultiplication_bad2 n (m-2)
On the other hand, this is also a useful feature under lazy evaluation to make infinite structures:
infiniteFooBar :: [String]
infiniteFooBar = "Foo" : infiniteBarFoo
infiniteBarFoo :: [String]
infiniteBarFoo = "Bar" : infiniteFooBar
Some people think it's better to be total than Turing-complete. Personally I can accept either, I've not had non-termination bite me in the ass often enough to care strongly, but it has happened.
Note: the halting problem
Given a function
terminates :: Program -> Bool
that returns whether a program terminates, we can construct a program that passes its own code toterminates
, and purposefully exhibits the opposite behavior, entering an infinite loop if it is predicted to terminate. The potential existence of such a problematic program proves theterminate
function cannot be total.
But there are other exceptions to totality, Haskell allows non-exhaustive patterns and run-time errors. Here's an example stolen from an answer on stackoverflow:
-- Valid Haskell that will compile, but pattern is non-exhaustive
fac :: Int -> Int
fac 0 = 1
fac n | n > 0 = n * fac (n-1)
-- Valid Haskell with exhaustive pattern, but can throw a custom run-time error
fac2 :: Int -> Int
fac2 0 = 1
fac2 n | n > 0 = n * fac2 (n-1)
fac2 n | otherwise = error "Cannot get factorial of negative number"
This is a consequence of being liberal with patterns. We've only used constructors so far, but as you can see Haskell also allows guards (such as | n > 0
). Again, a practical trade-off that you may or may not agree with. I don't like this one, preferring to instead use more restricted types to clearly define e.g. that we require a natural (unsigned integer) rather than an integer.
Conclusion
We've seen how a strongly-types purely-functional programming language like Haskell use proof to let you write, refactor and expand software with confidence. We've also explored the limits of those proofs in Haskell. If you're interested in stronger proofs, look at Elm, which uses purity and exhaustive patterns to avoid run-time exceptions, or dependently typed languages like Agda or Idris. Do expect them to be somewhat less practical than Haskell though.
Posted on November 12, 2019
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.