A cold shower for a cold shower
Alexander Chichigin
Posted on January 11, 2024
Background
Hillel Wayne being a firm believer in scientific empirical methods in general and in the field of Software Engineering in particular has prepared a list of
"cold showers" — papers challenging some widely held beliefs about various aspects of Software Development.
The very first "cold shower" addresses the question of Formal Verification, specifically the belief that
"Formal Verification is a great way to write software. We should prove all of our code correct."
Unfortunately, no source for the quote is given, presumably, it's a well-known and widely-held belief, though I personally know no one who would seriously believe and argue this point in this form.
As a counterargument and a "cold shower" Hillel offers "Verification Techniques", the fourth chapter of Peter Gutmann's thesis, "The Design and Verification of a Cryptographic Security Architecture". Wayne summarizes it as
Extensive literature review showing that formal methods are hard to learn, extremely expensive to apply, and often miss critical bugs.
Also, he counterbalances it with some caveats:
Written in 2000 and doesn't cover modern tools/techniques, such as TLA+ or dependent typing.
Considering that Coq Proof Assistant was released back in 1989, I wouldn't say Dependent Types were very modern even back in 2000, but yeah, a lot of development has happened during the last 20 years...
Anyway, the thesis that "extensive literature review showing that formal methods are hard to learn, extremely expensive to apply, and often miss critical bugs" was employed in a certain online discussion to support the POV that Formal Methods are useless or at least too costly to use in practice, any caveats were ignored, and the paper was referenced as the definitive proof. This prompted me to actually take a peek inside the paper, see what actual claims the author made, and what literature he analysed.
The Fourth Chapter
I've only skimmed the initial part of the chapter and started digging in from the third section which promised to describe the problems of Formal Methods.
3. Problems with Formal Verification
The first paragraph of this section gives the following literature review:
Empirical studies of the results of applying these methods, however, have had some difficulty in finding any correlation between their use and any gains in software quality [31], with no hard evidence available that the use of formal methods can deliver reliability more cost-effectively than traditional structured methods with enhanced testing [32]. Even in places where there has been a concerted push to apply formal methods, penetration has been minimal and the value of their use has been difficult to establish, especially where high quality can be achieved through other methods [33].
The citations are for the following papers:
- [31] “Observation on Industrial Practice Using Formal Methods”, Susan Gerhart, Dan Craigen, and Ted Ralston, Proceedings of the 15th International Conference on Software Engineering (ICSE’93), 1993, p.24.
- [32] “How Effective Are Software Engineering Methods?”, Norman Fenton, The Journal of Systems and Software, Vol.22, No.2 (August 1993), p.141.
- [33] “Industrial Applications of Formal Methods to Model, Design, and Analyze Computer Systems: An International Survey”, Dan Craigen, Susan Gerhart, and Ted Ralston, Noyes Data Corporation, 1994 (originally published by NIST).
Without digging further into the referenced papers, what catches my eye is the fact they all were more than five years old by 2000 when the "Verification Techniques" came out. As far as I can tell it's a recurring theme in the chapter: it relies on heavily outdated papers, even for the time of writing, and I see no relevance for 2024 at all...
3.1. Problems with Tools and Scalability
Issues with the scalability of both large proofs themselves and the running time of a proof assistant checking these proofs are very well-known. We can say that the invention of Symbolic Model-Checking, which scaled the approach by two orders of magnitude, led to a breakthrough in Hardware Verification, and largely won the inventors the Turing Award. But what exactly Gutmann was talking about?
Attempts to use the Boyer–Moore prover in practice lead to the observation that “the amount of effort required to verify the system was very large. The tools were often very slow, difficult to use, and unable to completely process a complex specification. There were many areas where tedious hand analysis had to be used” [46]
Sounds plausible and relatable. What source does it cite?
- [46] “Verification Technology and the A1 Criteria”, Terry Vickers Benzel, ACM SIGSOFT Software Engineering Notes, Vol.10, No.4 (August 1985), p.108.
Hmm... This paper was already 15 years (sic!) old by 2000... Anyway, what's a Boyer–Moore prover?
Nqthm is a theorem prover sometimes referred to as the Boyer–Moore theorem prover. It was a precursor to ACL2.
Wikipedia says it was a research theorem prover that later was replaced by ACL2 with the explicit aim of producing an industrial-strength prover addressing Nqthm's performance problems. So Gutmann criticises a piece of research software for being a research software not suitable for industrial applications. Interesting... Anyway, what's up with ACL2?
Wikipedia says ACL2 first appeared in 1990 but became publicly available only in 1996. This is still 4 years before 2000, and already in 1995, ACL2 was successfully employed to verify the floating point division of the AMD K5 CPU. And was pretty popular in Hardware Verification since then.
ACL2 is intended to be an "industrial strength" version of the Boyer–Moore theorem prover, NQTHM. Toward this goal, ACL2 has many features to support clean engineering of interesting mathematical and computational theories.
Industrial users of ACL2 include AMD, Arm, Centaur Technology, IBM, Intel, Oracle, and Collins Aerospace.
ACL2 Interesting applications among other things says:
Key properties of the Sun Java Virtual Machine and its bytecode verifier were verified in ACL2. Among the properties proved were that certain invariants are maintained by class loading and that the bytecode verifier ensures that execution is safe.
That happened sometime after 2000, and Gutmann couldn't possibly write about it, but I just want you to pause and appreciate the fact that you rely on some ACL2 proofs right now if you run HotSpot JVM or use services that run on it. And Oracle Labs still doing proofs about JVM in ACL2, they hired for such a position a couple of years ago.
Now back to the section, later we read
Another approach which has been suggested is to automatically verify (in the “formal proof of correctness” sense) the specification against the implementation as it is compiled [49]. This has the disadvantage that is can’t be done using any known technology.
Well, now we at least know the technologies: proof-carrying code, binary symbolic execution and abstract interpretation, concolic testing…
3.2. Formal Methods as a Swiss Army Chainsaw
This is the section where we can expect Gutmann to debunk that "formal Verification is a great way to write software. We should prove all of our code correct" statement. More than that, we now have the source for the quote:
Formal methods have in the past been touted as the ultimate cure for all security software assurance problems, a “panacea with unbounded applicability and potency” [50]
- [50] “Problems, methods, and specialisation”, Michael Jackson, Software Engineering Journal, Vol.9, No.6 (November 1994), p.249.
Wait a bit. Michael Jackson is a very well-known and well-respected researcher in Software Engineering, we know him as a deep thinker very far from fads and hype, did he really say that Formal Methods are a "panacea with unbounded applicability and potency"?
Let's look into his 1994 paper:
The continuing search for panaceas, for universal materials out of which everything can be made, for universal methods to solve all problems, strongly suggests that we have not yet begun to understand the nature of our field. Object-orientation is only the most recent in a long line of proposed panaceas, all claiming unbounded applicability and potency.
WOW, turns out it's OOP a "panacea with unbounded applicability and potency", and there's no mention of Formal Methods in sight at all!
Frankly speaking, I got very disappointed at this point and stopped reading...
5. Alternative Approaches
I've also skimmed through the rest of the chapter, and this last chapter compares XP (eXtreme Programming, do you still remember it?) as the main Agile method at the time to some "waterfall-like" methodologies you never heard and don't care about like ISO 9000, CMM, and CASE, which were criticised in the earlier sections.
Again, much better known to us now SCRUM had been developed throughout the 1990s with the defining paper appearing in 1995. Still, it hadn't gained mass adoption until later in the 2000s, thus Gutmann couldn't do a literature review on it even if he wanted to. This simply illustrates how outdated this work become in the years passed, now that we have half a dozen SCRUM variants, Lean, Kanban and several others. Likewise, Formal Methods and Theorem Provers didn't stuck in the 90s or even 2000s, improving usability, automation, performance, accreting libraries and users, and lately even actively integrating with Machine Learning methods.
Conclusion
Well, Peter Gutmann wasn't the first to totally misquote a reference changing the meaning to the opposite in a desire to support his opinion. And it doesn't make him a bad researcher. I firmly believe he did a great job at his original research and presented very valuable results. I suspect doing an extensive literature review and a thorough analysis of evidence for or against Formal Methods never was his aim and priority. I guess it was something he needed for his thesis, so he collected a big pile of somewhat outdated papers supporting his outlook on Formal Verification and threaded a believable narrative through it. That's a much-much better job than most PhD students care to do!
The only problem, this doesn't count as an "extensive literature review showing that formal methods are hard to learn, extremely expensive to apply, and often miss critical bugs". It didn't support this claim even at the time of publication, and even less so now, 20 years later.
And for the folks throwing around "scientific papers that prove or disprove bold claims" — you should actually read the papers you're waiving, and check the sources these papers cite...
Posted on January 11, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 29, 2024