Verifying Valid Anagram
Aaron Elligsen
Posted on September 6, 2022
Verifying LeetCode Problem: 242. Valid Anagram
Let's look at the problem definition.
Given two strings s and t, return true if t is an anagram of s, and false otherwise.
An Anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once.
Theory
In mathematics, there is an object called a multiset. It is like a set in that it contains elements, but unlike a set not every element in a multiset is distinct. In other words, repeats are allowed in a multiset. We can ask the multiset how many copies of some element it contains, also called the multiplicity of the element.
{1,2,3} == {1,2,3,3} //sets discard duplicates
//the numbers of 3 elements don't match so the multisets are not the same
multiset{1,2,3} != multiset{1,2,3,3}
A string is an anagram of another string if the multiset of the strings are equivalent.
Therefore, the problem is basically an equivalent of asking if the multiset of two strings are equal.
An implementation
Most languages don't natively support or provide a multiset object, but you can construct one with any sort of Map object. Basically, the element is the key, and the val is the count of the element in the multiset.
function toMultiset(s: string): Map<string, number> {
const s_mset: Map<string, number> = new Map();
for(let char of s) {
if(s_mset.has(char)) {
s_mset.set(char, s_mset.get(char)+1);
}else{
s_mset.set(char, 1);
}
}
return s_mset;
}
function msetEqual(s: Map<string, number>, t: Map<string, number>): boolean {
for(let [lchar,] of s) {
if(!(s.get(lchar) === t.get(lchar))) {
return false;
}
}
for(let [rchar,] of t) {
if(!(s.get(rchar) === t.get(rchar))) {
return false;
}
}
return true;
}
function isAnagram(s: string, t: string): boolean {
return msetEqual(toMultiset(s), toMultiset(t));
};
Specifying the problem
It is a very quick method to specify, so we will work top to bottom. In Dafny the string type is defined as the type seq.
isAnagram
method isAnagram(s: string, t: string) returns (equal: bool)
ensures (multiset(s) == multiset(t)) == equal
{
var smset := toMultiset(s);
var tmset := toMultiset(t);
equal := msetEqual(smset, tmset);
}
toMultiset
Dafny does provide a native multiset object and methods. We will use these to verify our Typescript method is equivalent.
method toMultiset(s: string) returns (mset: multiset<char>)
ensures multiset(s) == mset
{
mset := multiset{};
for i := 0 to |s|
invariant mset == multiset(s[0..i])
{
assert s == s[0..i] + [s[i]] + s[(i+1)..];
mset := mset + multiset{s[i]};
}
assert s == s[0..|s|];
return mset;
}
As we iterate through the loop the invariant is that the mset result is equal to the multiset of the slice of the string which we have visited so far. Verifying this method requires two assertions.
Dafny forgets what lists are equal to unless you can remind it with an assert (and again strings are lists in Dafny). Supposedly, this is done for performance reasons, but it is an absurdly frequent reason for problems verifying. Here we show Dafny that our string is the concatenation of the visited substring plus the current element plus the rest of the list.
Secondly, Dafny needs reminded that the slice from the 0th index up to but not including the length of the string is equal to the original string.
And to assert that our typescript implementation is equivalent to this I should say that:
mset := mset + multiset{s[i]};
is equivalent to adding and element to map and setting the multiplicity to 1 or adding 1 to the multiplicity of an element already in the multiset.
With that toMultiset verifies it is equal to the native method which takes a sequence of elements and returns a multiset.
msetEqual
Making a method in Dafny to assert two multiset are equal is a bit tricky. It's typically unnecessary since Dafny can do it directly but to verify our TypeScript implementation we must.
In TypeScript we cannot assert two maps are equal directly because JavaScript compares objects by reference rather than by value. This is why we created the msetEqual function in TypeScript.
To show the two maps are equals we iterate through all the elements/keys in the first map and then check if those keys are in the second map and the multiplicities/counts match.
Then we do the same for the second map to the first. This ensures that the sets are the same and one is not just a superset of the other containing exactly the firsts elements plus some others.
TypeScript/JavaScript provide a nice for-of syntax to iterate through all elements in a map.
Dafny provides no exact analogue but we can build it.
method msetEqual(s: multiset<char>, t: multiset<char>) returns (equal: bool)
ensures s == t <==> equal
{
ghost var sremoved: multiset<char> := multiset{};
var scopy := s;
while scopy != multiset{}
invariant s - sremoved == scopy
invariant sremoved !! scopy
invariant sremoved <= s
invariant forall x :: x in sremoved ==> x in s && x in t && t[x] == s[x]
{
var x :| x in scopy;
if !(x in t && s[x] == t[x]) {
return false;
}
var removed := multiset{};
sremoved := sremoved + removed[x := s[x]];
scopy := scopy - removed[x := s[x]];
}
We create a copy of a given multiset s
and a ghost var sremoved
equal to an empty multiset.
In side the loop, we pick some x such that x is an char in scopy.
if x is not in t or the multiplicity of x in s is not equal to the multiplicity of x in t then we return false. The multisets are not equal.
Otherwise, we create a new multiset named removed
to fill with x to the multiplicity of x in s. removed[x := s[x]]
. We add this to the ghost var and remove it from scopy. Importantly, this is equivalent to fully removing x from scopy.
All four loop invariants are required to show that forall x :: x in s ==> x in t && s[x] == t[x]
which is our real goal.
The first invariant invariant s - sremoved == scopy
established a relationship between s, sremoved, and scopy. As scopy iterates towards the empty multiset, sremoved grows towards s.
invariant sremoved !! scopy
is definitely the least understandable at a glance, but for multisets !!
is the disjoint operator. As in, sremoved !! scopy
is true if they have no elements in common. We need this to show the following.
invariant sremoved <= s
starts out true since the empty multiset is a subset of any other multiset. Inductively, the only things Dafny knows about sremoved is that s - sremoved == scopy
, if not for invariant 2. It could be that smremoved is some larger set that has some intersecting elements with S. As scopy goes to the empty set, it could be said that s is a subset of sremoved. However, the fact that sremoved and scopy are always disjoint means that eventually s == scopy + sremoved == multiset{} + sremoved
.
Finally, the most important invariant, invariant forall x :: x in sremoved ==> x in s && x in t && t[x] == s[x]
, which actually asserts that some subset of t is equal to s.
We proceed with the other direction using the same logic and we show that the the opposite is true, which implies t == s.
Full definition
method msetEqual(s: multiset<char>, t: multiset<char>) returns (equal: bool)
ensures s == t <==> equal
{
ghost var sremoved: multiset<char> := multiset{};
var scopy := s;
while scopy != multiset{}
invariant s - sremoved == scopy
invariant sremoved !! scopy
invariant sremoved <= s
invariant forall x :: x in sremoved ==> x in s && x in t && t[x] == s[x]
{
var x :| x in scopy;
if !(x in t && s[x] == t[x]) {
return false;
}
var removed := multiset{};
sremoved := sremoved + removed[x := s[x]];
scopy := scopy - removed[x := s[x]];
}
//assert s <= sremoved
assert s == sremoved;
ghost var tremoved: multiset<char> := multiset{};
var tcopy := t;
while tcopy != multiset{}
invariant t - tremoved == tcopy
invariant tremoved !! tcopy
invariant tremoved <= t
invariant forall x :: x in tremoved ==> x in s && x in t && t[x] == s[x]
{
var x :| x in tcopy;
if !(x in t && s[x] == t[x]) {
return false;
}
var removed := multiset{};
tremoved := tremoved + removed[x := s[x]];
tcopy := tcopy - removed[x := s[x]];
}
//assert s <= sremoved
assert t == tremoved;
return true;
}
Final thoughts
When I said all four invariants are required, it turns out that is false. It is required to show that sremoved and tremoved are equal to s and t respectively, which is what you would intuitively expect from how we define the body of the loop.
However, you can still verify with just invariants 1 and 4 and removing the equality assertion after the loop. In this situation, the for loops really imply that forall x :: x in s ==> x in sremoved ==> x in s && s in t && t[x] == s[x]
and for t respectively.
Best practices
Quite often Dafny is able to verify using fewer statements and constraints than I expect and so I usually go back through and comment out all invariants and assertions to see exactly which ones are really necessary.
There are many true statements about the state of variables during an algorithm. However, quite often they are unnecessary to prove that the algorithm does what you want it to.
One of the trickier aspects of Dafny is that there are multiple ways to write the same logical expression, and simultaneously Dafny has differing ability to handle automatically applying induction to different formulations, which means that is easier for it to verify some expressions than others. Part of the difficulty of verification is finding a valid logical/mathematical expression for your invariants, and then secondly writing it in such a way that Dafny is able to verify it easily.
Posted on September 6, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.