Programming without a programming language

Among my favorite subjects are the ways in which human language differs from the code we write for computers and how to avoid the evils of excessive complexity, both of which I've written about extensively in previous articles. This one is in part a retrospective to explain where these passions came from.

Background

If you Google custom programming language you will find a wealth of information about how to write your own programming language, as well as long lists of languages that have already been written. Most of these are completely unknown to all but their creators, so why do people write them and what are they used for? In my case it started some 30 years ago, but before I head off down Memory Lane, first a couple of paragraphs about computer languages in general.

Languages can be categorized in different ways and one is to class them as general purpose or domain-specific. All the mainstream programming languages, e.g. C/C++, Java, Python, JavaScript and PHP, are general purpose; designed to address any computing problem, and although custom general-purpose languages exist they tend to do things in much the same way as the mainstream ones.

Most custom languages are domain-specific languages (DSLs); that is, they are designed to address a specific application area, or domain. A few of these are well-known; HTML and SQL are probably the best examples. Some custom languages have been written to perform general-purpose duties but using a non-English vocabulary; others are heavily symbolic and are designed for use by mathematicians. Some are simple, others incredibly complex and one or two are just designed to mess with your mind; the programming equivalent of Klingon.

Many programmers regard writing a custom language as either unnecessary or in some way "outside" normal programming (or both), but a well designed DSL can separate high-level concerns from implementation details more cleanly than any other software technique. Syntaxes that comprise mostly space-delimited words (think of SQL in its most basic form) are quite easy to handle if they are sufficiently well constrained. They can either be embedded in other code (like Excel macros) or act as a wrapper around it and they tend to map well to the requirements expressed by domain owners.

Origins

I was first drawn into the world of domain-specific languages about 30 years ago, when required to write software for 8-bit microprocessors to control audio equipment such as mixers and routers in a radio studio. The coding for each piece of equipment had many similar features but the details varied wildly and were subject to frequent change. At the time there were no decent C compilers for such modest hardware and assembly code was very hard to write, understand or modify. The solution was to take a path I've followed ever since; to devise a cut-down, unambiguous form of English that expressed precisely what the program should do. This would be readable not just by myself and other programmers but also by the studio engineers who were not coders but understood better than anyone the domain they worked in and knew what they wanted.

The next step was to write a compiler that would take these instructions and turn them into intermediate code - an efficient, compact representation of their meaning - and a runtime interpreter that would execute the intermediate code on the target hardware.

The results were successful in that they solved the problem at hand. The language itself is long forgotten but it gave me insight into what programming is all about; insight that is often lost in our technology-led rush to adopt and apply the latest techniques. We use hydraulic presses to crack nuts rather than think about what simpler tools might do the job equally well.

Future-proofing

A second division of programming languages is between those that are readable by non-programmers and those that aren't. To me this is far from a trivial point. It is very bad practice to write code that cannot be read by anyone but yourself, unless you can be sure the code will never have to be read by anyone except yourself. And "anyone" also includes your future self; a person who has long forgotten the details of the project you are working on now and who will not thank you for requiring excessive effort to figure out what it all means.

Just about the only thing that can be guaranteed about the future is that they'll all still speak English. And that brings me back to my old radio studio code, where we wrote control programs in something as close to English as we could get.

Deep in a dusty archive I found some of this old code. Here's a fragment of a test script for one of the items; an audio switching unit the size of a small domestic refrigerator:

input 0 action ALT call Test
input 288 call Prompt

    enable inputs
    set alpha 1 to "Ready"
    prompt "System ready"
    stop

Test:
    if ON
    begin
        set output 0
        set output 288
        set output 289
        set alpha 1 to "Testing"
        set alpha 40 to "Chan 40"
        set alpha 41 to "Chan 41"
    end
    else
    begin
        clear output 0
        clear output 288
        clear output 289
        set alpha 1 to ""
    end
    stop

Prompt:
    if ON set output 1
    else clear output 1
    prompt "288"
    stop

The date on this code is some time in 1999, though I'm sure it was initially written several years earlier; by whom I don't remember. I left the company 17 years ago but I do recall that the audio switch implemented an X-Y matrix of audio inputs and outputs with the ability to connect any input to any output via mercury relays. It had an array of small alphanumeric displays (at least 41, it seems from the code) and a large number (hundreds) of logical inputs, each of which had rules governing its behavior. Some connected to front panel buttons; others reacted to events elsewhere around the studio.

The script defines action handlers for 2 of the inputs, giving each the name of a program label to execute code from when their state changes. The default action was MOM (momentary) but some buttons had an alternate action (press to set ON, press again to set OFF) so the first input here defines ALT as the action. Without diving into the source of the runtime I assume the language sets the value of ON to be the state of the input whose change of state is being handled, but I have no idea why the label Prompt has that name when its action is to set or clear an output.

The scripts got quite large. I found another that defines actions for 288 inputs but I'll spare you that one.

The point is that anyone with even a passing knowledge of the system can read the code and have some idea of what it does. I've only shown a fraction of the command set available; it included numeric and string variables, 4-function arithmetic, string handling, control structures like if, while and switch as well as keywords like input and output that related directly to the hardware of the system. The compiler was written in C++ on Windows and the runtime was written in assembler for an 8-bit TMS370 microprocessor. The compiled scripts arrived via a simple loader on a serial bus, to avoid having to reprogram EEPROMs.

So?

I called this piece "Programming without a programming language" because that's really what we were doing back then. If you were to describe the actions of the audio switch you'd write pseudocode in English that bears a close resemblance to the code above. And that's what usually was written by the studio engineers; all I did was remove ambiguity and syntactic noise to leave a terse but easily understood script that can still be read and understood by almost anyone, decades later. I don't think we ever even dignified the language with a proper name, but after ten or so years working in this way the habit became ingrained and now my approach to any programming problem is to first ask myself what the solution would look like in English, then if it seems viable consider how to build a compiler that can handle that syntax.

Not every problem is directly amenable to this approach. Many entities are complex and result in cumbersome code when scripted. However, the power and ingenuity of human language is such that in these situations we invent new vocabulary and syntax to represent and deal with new entities, giving them names that go into the language seamlessly. In English, new entities arrive continuously; laser, smartphone, barbecue, hypermarket, boycott, tomfoolery, millennial... the list goes on and on. In each case, the new addition neatly replaces a much longer set of words that previously had to be used each time to convey the same meaning.

Domain-specific scripting languages also have the ability to grow seamlessly as new features need to be added, as long as the original syntax is designed well enough to encompass the new without breaking. When a new entity is added, all the previously-needed cumbersome script is replaced by the new, simpler syntax; the code that makes it work is now written in whatever language the system itself is built with, hidden away from the people who have to write and maintain the scripts. This is a good example of 'separation of concerns'.

General-purpose programming languages don't have this facility to grow and expand on a piecemeal basis. Instead, we supplement them with a growing and ever-changing array of highly complex libraries and frameworks, most of which are anything but intuitive. Instead of enhancing the language, which the human brain handles well, we add structure, which our brains are less well equipped to deal with. (In everyday life we handle new vocabulary with ease but most of us struggle with grammar and have to consult weighty textbooks to gain understanding.)

For people who work with the tools on a daily basis this is fine, but it's not good for long-term maintainability, partly because the level of skill required is more commonly found in developers than in maintenance engineers, and partly because the tools themselves have a habit of rapidly becoming obsolete when something newer and shinier comes along. That's the thing about most development; it's all about the cutting edge. A more cautious approach would be to suggest that complex tools should only be used for code that is guaranteed high-level ongoing support by people with skills and experience equivalent to those of the original programmer(s). If this guarantee cannot be made there is a strong likelihood that the product will become unmaintainable after only a few years. Unfortunately, I see little sign that this problem is recognized by the software industry, which starts from the flawed assumption that anyone can 'read the code'.

None of this removes the need for complex tools. We use operating systems, word processors, spreadsheets and Java, JavaScript or Python compilers without having to know how they were built or how they work. A component that will never require maintenance except by its builders can use whatever technology is most appropriate to make it work. It can be relied upon to do its job day in, day out with no understanding needed about what goes on inside it. But when you're building something that others will maintain, spare a thought for who those 'others' are and don't make your product so complex they won't be able to understand it when changes are needed. Wherever possible, the simplest, most effective approach is to use the language of the user rather than that of the system.

And today...

When a couple of years ago I wanted to improve my rather basic JavaScript skills I set myself the challenge of recreating a variant of that old DSL, with both the compiler and runtime written in JavaScript so they can be run in any browser. The vocabulary and syntax of the language would be tilted towards the kind of things people use JavaScript for in web pages. This was a substantial project but it turned out far better than I expected and I now have a system good enough to build entire websites by describing them in multiple linked scripts that can be embedded in the HTML or downloaded on demand. It was a double win; in the space of those 2 years I also gained far more practical experience of JavaScript than I would have picked up from any number of programming tutorials.

More recently I wanted to learn some Python; it's just about the most popular programming language today and I didn't want to be left behind. It seemed to me the obvious thing was to repeat the exercise, this time writing my compiler and runtime in Python. This version is only a couple of months old so it has some way to go; I'm planning to provide it with features for writing shell scripts as an alternative to Bash and Perl (or indeed Python). My enthusiasm for high-level scripting DSLs remains undiminished.

Anyone curious about either of these projects can find all the current code - with some documentation - in my GitHub repository. I also welcome questions and comments, either here or via the contact details on GitHub.

Blog

Programming without a programming language

Graham Trott

Background

Origins

Future-proofing

So?

And today...

Join Our Newsletter. No Spam, Only the good stuff.

Related