Verify Contains Duplicate I
Aaron Elligsen
Posted on September 6, 2022
Verify LeetCode Question: 217. Contains Duplicate
Let's look at problem definition listed here
Given an integer array nums, return true if any value appears at least twice in the array, and return false if every element is distinct.
A basic implementation
Here is how we might solve it in JavaScript/TypeScript.
function containsDuplicate(nums: number[]): boolean {
let numSet: Set<number> = new Set();
for(let elem of nums) {
if(numSet.has(elem)) {
return true;
}
numSet.add(elem);
}
return false;
};
As we iterate through the array of numbers, we add each element into a set object. We check if that element has already been added and if that is true then there exists an element which is a duplicate in the array.
Verifying our implementation
There are two primary ways of defining code in Dafny, methods and functions. You can define a method
that will allow multiple statements, mutable state, references to objects or classes, or calling other methods. Most code we write in traditional software development would fall under the category of a method. Methods cannot be used in specification clauses.
A function
is more limited, they only allow a single expression (this is a simplification), but they can be used in specification clauses.
Verification vs implementation
When we verify a program in Dafny there are some statements which are what the code is doing. Typically traditional statements, expressions, math operations, loops, method calls, object and class instantiation found in most languages.
The real innovation and power of Dafny is that it includes specification clauses which allow us to describe what the code is intended to do! Even more importantly, it verifies that these clauses are true based on our implementation!
Contains Duplicate verification
If we look back at the LeetCode problem definition we can actually incorporate it into our specification.
An ensure
clause on a function or method informs Dafny (and us) about what will be true about a value returned by it. These must be written between the function/method signature and the body.
// seq is an array-like list object
// I will use the term sequence and array interchangeably
method containsDuplicateI(nums: seq<int>) returns (containsDuplicate: bool)
ensures containsDuplicate ==> exists i,j :: 0 <= i < j < |nums| && nums[i] == nums[j]
{
// |nums| is the length of the array/sequence
}
We can read this as, if containsDuplicate
returns true
that implies that there are two different indices inside the array which have the same value, aka are duplicates.
It is incredibly powerful that, not only does Dafny prove that our code is correct, it leaves us with more information about what method returning a value actually means.
To break it down further, containsDuplicate is the boolean value we are returning. The ==>
symbol is the implication operator basically meaning if the expression on the left is true then the expression on the right should be true.
In this case containsDuplicate ==> exists x,y :: 0 <= x < y < |nums| && nums[x] == nums[y]
we use the existential quantifier to specify what a duplicate means. The expression can be read like this: there exists
an x
and a y
such that ::
, x and y are numbers/indices between 0 and the length of the array and num[x] equals num[y].
Contains Duplicate I Verification
The Dafny code is a straightforward translation of the TypeScript version. The syntax has a few differences naturally.
method containsDuplicate(nums: seq<int>) returns (containsDuplicate: bool)
ensures containsDuplicate ==> exists i,j :: 0 <= i < j < |nums| && nums[i] == nums[j]
{
var numSet: set<int> := {};
for i:= 0 to |nums|
invariant forall x :: x in numSet ==> x in nums[0..i]
{
if nums[i] in numSet {
return true;
}
//+ is overloaded to set union
numSet := numSet + {nums[i]};
}
return false;
}
In the for
loop, between the loop condition and the body, we assert another specification clause called a loop invariant
. A loop invariant
is an expression which must be true before the loop starts, after every step of the loop, and after it is finished.
In this case, it says that everything in numSet is from the values in the slice of the array which starts from 0 up to but not including i, so nums[i] is not included.
Clearly, this tracks with our intuition which is that everything in array which is less than the current index is in the set.
If we do encounter an element nums[i] which is already in the numSet, then we know that there is some index x, less than our current index i, which has the same value in the array nums. If we make it all the way through the array but do not find a matching element in the set then we return false.
Tricky details of invariants
This is a simple coding problem in most languages and in Dafny, but the loop invariant forall x :: x in numSet ==> x in nums[0..i]
is carefully chosen to create a connection between numSet and nums.
The converse forall x :: x in nums[0..i] ==> x in numSet
seems equally valid but it is not and it will not verify. This direction of implication tells us only about some things in numSet, there could be more elements in numSet that came from before the loop or some other step. In other words if something was in numSet, we couldn't guarantee that it was also in nums.
When evaluating loop invariants Dafny uses induction, it assumes the loop invariant was true of the last loop. It then runs the body and checks if logically the invariant was maintained. Dafny does not do a brute force exhaustive check of all possible program states.
Creating useful loop invariants is difficult. It is definitely a skill to be learned and not every true invariant about a problem will be automatically provable to Dafny. Quite often you may need to provide extra logic to Dafny to convince it that it is true.
Posted on September 6, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.