Christian Neumanns
Posted on October 18, 2019
In part 1 of this article we saw how the null pointer error can be eliminated with null-safety or with the Maybe/Option type.
In this part we'll have a closer look at code snippets that illustrate frequent use cases of handling the 'absence of a value'.
Useful Null-Handling Features
Besides null-safety, a language should also provide specific support for common null handling operations. Let's have a look at some examples.
Note: The additional support for null-handling presented in the following chapters is typically found only in null-safe languages. However, other languages can also provide these features, even if they are not null-safe.
Searching the First Non-Null Value
Suppose we want to look up a discount for a customer. First we try to retrieve the value from a web-service. If the value is not available (i.e. the result is null
), we try to retrieve it from a database, then from a local cache. If the value is still null
we use a default value of 0.0.
Now we want to write a function that provides the discount for a given customer. To keep the example simple, we ignore error-handling. Moreover, we don't use asynchronous functions to make the lookup process faster.
Java
First attempt:
static Double customerDiscount ( String customerID ) {
Double result = discountFromNet ( customerID );
if ( result != null ) {
return result;
} else {
result = discountFromDB ( customerID );
if ( result != null ) {
return result;
} else {
result = discountFromCache ( customerID );
if ( result != null ) {
return result;
} else {
return 0.0; // default value
}
}
}
}
What an ugly monstrosity! Let's quickly rewrite it:
static Double customerDiscount ( String customerID ) {
Double result = discountFromNet ( customerID );
if ( result != null ) return result;
result = discountFromDB ( customerID );
if ( result != null ) return result;
result = discountFromCache ( customerID );
if ( result != null ) return result;
return 0.0; // default value
}
Note: There's nothing wrong with using several return
statement, as in the code above (although some people might disagree).
The complete Java source code for a test application is available here.
Haskell
Let's start again with the straightforward, but ugly version:
customerDiscount :: String -> Float
customerDiscount customerID =
case (discountFromNet customerID) of
Just d -> d
Nothing -> case (discountFromDB customerID) of
Just d -> d
Nothing -> case (discountFromCache customerID) of
Just d -> d
Nothing -> 0.0
There are different ways to write better code. Here is one way:
customerDiscount :: String -> Float
customerDiscount customerID =
let discountMaybe = discountFromNet customerID
<|> discountFromDB customerID
<|> discountFromCache customerID
in fromMaybe 0.0 discountMaybe
The complete Haskell source code for a test application, including alternative ways to write the above function, is available here.
More information (and even more alternatives) can be found in the Stackoverflow question Using the Maybe Monad in reverse.
PPL
Again, first the ugly version, written in PPL:
function customer_discount ( customer_id string ) -> float_64
if discount_from_net ( customer_id ) as net_result is not null then
return net_result
else
if discount_from_DB ( customer_id ) as DB_result is not null then
return DB_result
else
if discount_from_cache ( customer_id ) as cache_result is not null then
return cache_result
else
return 0.0
.
.
.
.
The code becomes a one-liner and more readable with the practical if_null:
operator designed for this common use case:
function customer_discount ( customer_id string ) -> float_64 = \
discount_from_net ( customer_id ) \
if_null: discount_from_DB ( customer_id ) \
if_null: discount_from_cache ( customer_id ) \
if_null: 0.0
The if_null:
operator works like this: It evaluates the expression on the left. If the result is non-null, it returns that result. Else it returns the expression on the right.
In our example we use a chain of if_null:
operators to find the first non-null value. If the three functions called in the expression return null
, we return the default value 0.0
.
The complete PPL source code for a test application is available here.
Getting a Value in a Path With Nulls
Sometimes we need to do the opposite of what we did in the previous chapter. Instead of stopping at the first non-null value, we continue until we've found the last non-null value.
For example, suppose a customer
record type with two attributes:
name
: a non-null stringaddress
: a nullable address
Record type address
is defined as follows:
city
: a nullable stringcountry
: a non-null string
Now we want to create a function that takes a customer as input, and returns the number of characters in the customer's city. If the customer's address attribute is null
, or if the address's city
attribute is null
then the function should return 0.
Java
These are the record types written in idiomatic Java:
static class Customer {
private final String name;
private final Address address;
public Customer ( String name, Address address) {
this.name = name;
this.address = address;
}
public String getName() { return name; }
public Address getAddress() { return address; }
}
static class Address {
private final String city;
private final String country;
public Address ( String city, String country) {
this.city = city;
this.country = country;
}
public String getCity() { return city; }
public String getCountry() { return country; }
}
Note: We don't use setters because we want our types to be immutable.
As seen already, all types are nullable in Java. We cannot explicitly specify if null
is allowed for class fields.
Function (method) customerCitySize
can be implemented as follows:
static Integer customerCitySize ( Customer customer ) {
Address address = customer.getAddress();
if ( address == null ) return 0;
String city = address.getCity();
if ( city == null ) return 0;
return city.length();
}
Alternatively we could have used nested if statements, but the above version is more readable and avoids the complexity of nested statements.
We can write a simplistic test:
public static void main ( String[] args ) {
// city is non-null
Address address = new Address ( "Orlando", "USA" );
Customer customer = new Customer ( "Foo", address );
System.out.println ( customerCitySize ( customer ) );
// city is null
address = new Address ( null, "USA" );
customer = new Customer ( "Foo", address );
System.out.println ( customerCitySize ( customer ) );
// address is null
customer = new Customer ( "Foo", null );
System.out.println ( customerCitySize ( customer ) );
}
Output:
7
0
0
The whole Java source code is available here.
Haskell
Defining the record types is easy:
data Customer = Customer {
name :: String,
address :: Maybe Address
}
data Address = Address {
city :: Maybe String,
country :: String
}
There are several ways to write function customerCitySize
in Haskell. Here is, I think, the most readable one for people more familiar with imperative programming. It uses the do notation:
import Data.Maybe (fromMaybe)
customerCitySize :: Customer -> Int
customerCitySize customer =
let sizeMaybe = do
address <- address customer -- type Address
city <- city address -- type String
return $ length city -- type Maybe Int
in fromMaybe 0 sizeMaybe
Here is a version that doesn't use the do notation:
customerCitySize :: Customer -> Int
customerCitySize customer =
let addressMaybe = address customer -- type Maybe Address
cityMaybe = addressMaybe >>= city -- type Maybe String
sizeMaybe = length <$> cityMaybe -- type Maybe Int
in fromMaybe 0 sizeMaybe
If we are careful with operator precedence, we can shorten the code:
customerCitySize :: Customer -> Int
customerCitySize customer =
fromMaybe 0 $ length <$> (address customer >>= city)
Instead of using fromMaybe
we can use maybe
to provide the default value:
customerCitySize :: Customer -> Int
customerCitySize customer =
maybe 0 length $ address customer >>= city
Yes, this code is concise. But there is a lot going on behind the scenes. A looooot!. To really understand the above code one has to understand Haskell. And yes, we use a monad, indicated by the bind operator >>=
in the code. For more information please refer to Haskell's documentation.
We can write a quick test:
main :: IO ()
main = do
-- city is defined
let address1 = Address {city = Just "Orlando", country = "USA"}
let customer1 = Customer {name = "Foo", address = Just address1}
putStrLn $ show $ customerCitySize customer1
-- city is not defined
let address2 = Address {city = Nothing, country = "USA"}
let customer2 = Customer {name = "Foo", address = Just address2}
putStrLn $ show $ customerCitySize customer2
-- address is not defined
let customer3 = Customer {name = "Foo", address = Nothing}
putStrLn $ show $ customerCitySize customer3
Again, the output is:
7
0
0
The whole Haskell source code is available here. There are also two examples of customerCitySize
implementations that compile without errors, but produce wrong results.
PPL
First, the record types:
record type customer
attributes
name string
address address or null
.
.
record type address
attributes
city string or null
country string
.
.
Function customerCitySize
is written like this:
function customer_city_size ( customer ) -> zero_pos_32 =
customer.address.null?.city.null?.size if_null: 0
Note the embedded null?
checks. The evaluation of customer.address.null?.city.null?.size
stops as soon as a null
is detected in the chain. In that case, the whole expression evaluates to null
.
The if_null:
operator is used to return the default value 0
if the expression on the left evaluates to null
.
Instead of .null?.
we can also simply write ?.
. Hence the function can be shortened to:
function customer_city_size ( customer ) -> zero_pos_32 =
customer.address?.city?.size if_null: 0
Simple test code looks like this:
function start
// city is non-null
const address1 = address.create ( city = "Orlando", country = "USA" )
const customer1 = customer.create ( name = "Foo", address = address1 )
write_line ( customer_city_size ( customer1 ).to_string )
// city is null
const address2 = address.create ( city = null, country = "USA" )
const customer2 = customer.create ( name = "Foo", address = address2 )
write_line ( customer_city_size ( customer2 ).to_string )
// address is null
const customer3 = customer.create ( name = "Foo", address = null )
write_line ( customer_city_size ( customer3 ).to_string )
.
Output:
7
0
0
The whole PPL source code is available here.
Comparison
Here is a copy of the three implementations:
-
Java
static Integer customerCitySize ( Customer customer ) { Address address = customer.getAddress(); if ( address == null ) return 0; String city = address.getCity(); if ( city == null ) return 0; return city.length(); }
-
Haskell
customerCitySize :: Customer -> Int customerCitySize customer = maybe 0 length $ address customer >>= city
-
PPL
function customer_city_size ( customer ) -> zero_pos_32 = customer.address?.city?.size if_null: 0
Comparisons
Now that we know how the null pointer error is eliminated, let us look at some differences between using the Maybe
monad in Haskell and null-safety in PPL.
Note: The following discussion is based on the Haskell and PPL examples shown in the previous chapters. Hence, some of the following observations are not valid in other languages that work in a similar way. For example, F#'s Option
type is very similar to Haskell's Maybe
type, but these two languages are far from being the same. Reader comments about other languages are of course very welcome.
Source Code
Here is a summary of the differences we saw in the source code examples.
Declaring the type of a nullable reference
Haskell: Maybe string
(other languages use Option
or Optional
)
PPL: string or null
(other languages use string?
)
As seen already, the difference between Haskell and PPL is not just syntax. Both use different concepts.
Haskell uses the Maybe
type with a generic type parameter. Form the Haskell doc.: "A value of type Maybe a either contains a value of type a (represented as Just a), or it is empty (represented as Nothing). The Maybe type is also a monad."
On the other hand, PPL uses union types to state that a value is either a specific type, or null
.
A non-null value used for a nullable type
Haskell: Just "qwe"
(other languages: Some "qwe"
)
PPL: "qwe"
This difference is important!
In Haskell "qwe"
is not type compatible to Just "qwe"
. Suppose the following function signature:
foo :: Maybe String -> String
This function can be called as follows:
foo $ Just "qwe"
But a compiler error arises if we try to call it like this:
foo "qwe"
There are a few consequences to be aware of.
First, if a type changes from Maybe T
to T
, then all occurrences of Just expression
must be changed to expression
. The inverse is true too. A change from type T
to Maybe T
requires all occurrences of expression
to be refactored to Just expression
.
This is not the case in PPL. An expression of type string
is type-compatible to an expression of string or null
(but the inverse is not true). For example, the function ...
foo ( s string or null ) -> string
... can be called like this:
foo ( "qwe" )
If the function is later refactored to ...
foo ( s string ) -> string
... then it can still be called with:
foo ( "qwe" )
Secondly, in Haskell some functions with the same name might exist for input type Maybe T
, as well as for input T
. But the semantics are different. For example, length "qwe"
returns 3 in Haskell, while length $ Just "qwe"
returns 1. It is important to be aware of this, because there is no compile-time error if function length
is used for an expression whose type changes from Maybe T
to T
or vice-versa.
Thirdly, one has to be aware of the possibility of nested Maybe
s in Haskell. For example, suppose again we declare:
data Customer = Customer {
name :: String,
address :: Maybe Address
}
data Address = Address {
city :: Maybe String,
country :: String
}
What is the return type of the following function?
customerCity customer = city <$> address customer
Is it Maybe string
. No, it's Maybe ( Maybe string )
- a nested Maybe
. Ignoring this can lead to subtle bugs. For an interesting discussion see the Stackoverflow question Simplifying nested Maybe pattern matching.
'No value' symbol
Haskell: Nothing
(other languages: None
)
PPL: null
(other languages: nil
, void
, ...)
Checking for null
Haskell (one way to do it):
intToString :: Maybe Integer -> Maybe String
intToString i = case i of
Just 1 -> Just "one"
Nothing -> Nothing
_ -> Just "not one"
Note: Omitting the Nothing
case does not produce a compiler error. Instead, the function returns Just "not one"
if it is called with Nothing
as input.
PPL (new version):
function int_to_string ( i pos_32 or null ) -> string or null = \
case value of i
when null: null
when 1 : "one"
otherwise: "not one"
Note: Omitting the when null
case results in the following compiler error:
Clause 'when null' is required because the case expression might be null at run-time.
Providing a default non-null value
Haskell: fromMaybe 0 size
(requires Data.Maybe; F#: defaultArg 0 size
)
PPL: size if_null: 0
(other languages: size ?: 0
, ( ?:
is sometimes called 'Elvis operator')
Getting the first non-null value in a chain, or else a default value
Haskell:
customerDiscount :: String -> Float
customerDiscount customerID =
let discountMaybe = discountFromNet customerID
<|> discountFromDB customerID
<|> discountFromCache customerID
in fromMaybe 0.0 discountMaybe
PPL:
function customer_discount ( customer_id string ) -> float_64 = \
discount_from_net ( customer_id ) \
if_null: discount_from_DB ( customer_id ) \
if_null: discount_from_cache ( customer_id ) \
if_null: 0.0
Getting the last value in a chain, or else a default value
Haskell:
customerCitySize :: Customer -> Int
customerCitySize customer =
maybe 0 length $ address customer >>= city
PPL:
function customer_city_size ( customer ) -> zero_pos_32 =
customer.address?.city?.size if_null: 0
Implementation
Back in 1965, Tony Hoare introduced null
in ALGOL "simply because it was so easy to implement", as he said.
In Java, and probably most other programming languages, null
is implemented by simply using the value 0
for a reference. That is to say, if we write something like name = "Bob"
, then the memory address used for variable name
contains the starting address of the memory block that stores the string value "Bob"
. On the other hand, when name = null
is executed, then the content of the memory address used for variable name
is set to 0
(i.e. all bits set to zero). Easy and efficient, indeed!
A more thorough explanation is available in chapter Run-time Implementation of my article A quick and thorough guide to 'null'.
So, implementing null
is easy. However, adding null-safety to a language is a totally different story. Implementing compile-time-null-safety in a practical way is far from being easy. Adding good support to simplify null-handling as far as possible is a challenge. Adding null-safety and good support for null-handling makes life more difficult for language creators, but much easier for language users (i.e. software developers). This doesn't come as a surprise, though. It's just a frequently observed fact of life:
It is easy to make it difficult to use.
It is difficult to make it easy to use.
On the other hand, a type like Maybe
can simply be added to the language's standard library, without the need for special support in the language.
In he case of Haskell, Maybe
is a monad in the standard prelude, and Haskell's standard functional programming features are used to handle Maybe
values.
Space and Time
Let's starts with null
.
There are just two kinds of basic operations needed at run-time:
Assign
null
to a reference (e.g.name = null
): this is typically done by just writing0
to a memory cell.Check if a reference points to
null
(e.g.if name is null
): this is very quickly done by just comparing the content of a memory cell with0
.
The conclusion is obvious: null
operations are extremely space- and time-efficient.
On the other hand, using a wrapper type is probably less efficient, unless the compiler uses very clever optimizations.
As a general observation, it is probably fair to say that, for a given language, using a wrapper type cannot be made faster than using 0 for a null
reference.
In practice, however, the performance difference might not be an issue in many kinds of applications.
A Note On The "Billion Dollar Mistake"
Yes, Tony Hoare stated that null
has "probably caused a billion dollars of pain and damage in the last forty years".
However, a few seconds later he said the following, which is really important, but often ignored:
More recent programming languages like Spec# have introduced declarations for non-null references. This is the solution, which I rejected in 1965.
-- Tony Hoare
The mistake was not the invention of null per se. The mistake was the lack of compile-time-null-safety and good support for null-handling in programming languages.
As seen in this article it is possible to eliminate the null pointer error in languages that use null
. No "billion-dollar mistake" anymore!
Isn't it amazing that it took the software development industry over 40 years to recognize this and start creating null-safe languages?
Summary
Here is a summary of the key points:
Java (and most other popular programming languages)
All reference types are nullable.
null
can be assigned to any reference (variable, input argument, function return value, etc.).There is no protection against null pointer errors. They occur frequently and are the reason for the billion-dollar mistake.
Haskell (and some other programming languages using Maybe/Option)
null
is not supported. Hence null pointer errors cannot occur.The
Maybe
type (a monad with a type parameter) is used to manage the 'absence of a value'.Pattern matching is used to test for
Nothing
.Standard language features are used to handle
Maybe
values (e.g. the monad'sbind
operator>>=
).
PPL (and some other null-safe programming languages)
By default all reference types are non-nullable and
null
cannot be assigned.null
is an ordinary type with the single valuenull
. Union types are used to handle the 'absence of a value' (e.g.string or null
).The compiler ensures a
null
-check is done before executing an operation on a nullable type. Thus null pointer errors cannot occur.The language provides specific support to simplify
null
-handling as far as possible. Null-handling code is concise, and easy to read and write.
Conclusion
The aim of this article was to compare null-safety in PPL with the Maybe
type in Haskell. We did this by looking at a number of representative source code examples.
By comparing two languages, we must of course be careful not to generalize our observations to all other languages.
However, in the context of this article we saw that:
Null-safety, as well as the
Maybe
type eliminate the null pointer error.Using the
Maybe/Optional
type is easier to implement in a language than null-safety. It simplifies life for language designers and implementers.Providing good support for null-safety is a challenge for language creators. However, it simplifies life for developers.
Header image by dailyprinciples from Pixabay.
Posted on October 18, 2019
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.