How to make your own programming language in JavaScript

jcubic

Jakub T. Jankiewicz

Posted on May 7, 2022

How to make your own programming language in JavaScript

I've wanted to have my own programming language, that will make it easier to create text-based adventure games for my Open Source project jQuery Terminal. The idea for the language came after I've created a paid gig for one person, let's call him Ken, that needed this type of game, where the user interacted with the terminal and was asked a bunch of questions and it was like an adventure game, related to Crypo. The code I've written, that Ken needed, was data-driven by a JSON file. It was working nicely, Ken could easily change the JSON and have the game changed however he wanted. I've asked if I could share the code since it was a very cool project and Ken agreed that I can do that two months after he publish the game. But after a while, I've realized that I can have something much better. My own DSL language, that will make it simpler to create text-based adventure games. A person with a bit of programming knowledge like Ken, could easily edit the game, because the language will be much simpler than complex JavaScript code that is needed for something like this. And even if I would be asked to create a game like the one for Ken, it would be much easier and faster for me. This is how Gaiman programming language has started.

I used PEG.js before, so it was my obvious choice for the parser generator. First I've started with an arithmetic example, modified it, and then added if statement and boolean expressions. When I've had this first proof of concept that was generating output JavaScript code, I was so excited that I've had to write an article and share how simple it is to create your own programming language in JavaScript.

In the end, there is a simple demo playground, and if you want something more cool, look at the Gaiman website, link on GitHub.

So to the point, let's dive in.

What is a Compiler?

A Compiler is a program that translates code from one programming language to another language. For instance, the C compiler translates a program written in C language into machine code (binary that can be interpreted by the computer). But there are also compilers that translate one human-readable language into different readable language. For instance, ClojureScript is compiled into JavaScript. This process is often called transpiling and program that does this is often called transpiler.

What is a Parser?

A Parser is a program that can be part of the compiler or interpreter. It takes input code as a sequence of characters and produces AST (Abstract Syntax Tree), which can be used by code generator (part of the compiler) to generate the output code or by the evaluator (part of interpreter) to execute it.

What is AST?

AST is an acronym for Abstract Syntax Tree. It's the way to represent code in a format that tools can understand. Usually in form of tree data structure. We will use AST in the format of an Esprima, which is a JavaScript parser that outputs AST.

What is a Parser Generator?

Parser generator as the name suggests is a program that generates the source code of a parser for you based on grammar (language specification). Written in a specific syntax. In this article, we will use PEG.js parser generator that generates a JavaScript code that will parse the code for your language and output AST.

A Parser generator is also a compiler, so you can call it compiler compiler. A Compiler that can output a compiler for your language.

JavaScript Code generation

What's cool about Esprima syntax is that there are tools that generate code based on their AST. An example is escodegen which takes Esprima AST as input and outputs JavaScript code. You can think that you can use just strings to generate code, but this solution will not scale. In this tutorial, I show only a single if statement but you will run into a lot of problems if you will have more complex code.

Simple PEG.js parser example

PEG.js is a compiler for Parsing expression Grammars written in JavaScript. It takes simpler PEG language that uses inline JavaScript code and output a parser.

Below I will show you how to create a simple parser PEG.js grammar for if statement that will output AST, which then later will be transformed into JavaScipt code.

The syntax of PEG.js is not very complicated, it consists of the name of the rule, then the matching and optional block of JavaScript that is executed and returned from the rule.

Here is a simple arithmetic example provided by PEG.js documentation:

{
  function makeInteger(o) {
    return parseInt(o.join(""), 10);
  }
}

start
  = additive

additive
  = left:multiplicative "+" right:additive { return left + right; }
  / multiplicative

multiplicative
  = left:primary "*" right:multiplicative { return left * right; }
  / primary

primary
  = integer
  / "(" additive:additive ")" { return additive; }

integer "integer"
  = digits:[0-9]+ { return makeInteger(digits); }
Enter fullscreen mode Exit fullscreen mode

The output parser from this grammar can parse and evaluate simple arithmetic expressions for example 10+2*3 that evaluates to 16. You can test this parser at PEG.js Online Tool. Note that it doesn't handle spaces between tokens (to simplify the code), with a parser you need to handle this explicitly.

But what we need is not to interpret the code and return a single value but return Esprima AST. To see how Esprima AST looks like you can check AST Explorer select Esprima as output and type some JavaScript.

Here is an example of simple code like this:

if (foo == "bar") {
   10 + 10
   10 * 20
}
Enter fullscreen mode Exit fullscreen mode

The output in JSON format looks like this:

{
  "type": "Program",
  "body": [
    {
      "type": "IfStatement",
      "test": {
        "type": "BinaryExpression",
        "operator": "==",
        "left": {
          "type": "Identifier",
          "name": "foo",
          "range": [
            4,
            7
          ]
        },
        "right": {
          "type": "Literal",
          "value": "bar",
          "raw": "\"bar\"",
          "range": [
            11,
            16
          ]
        },
        "range": [
          4,
          16
        ]
      },
      "consequent": {
        "type": "BlockStatement",
        "body": [
          {
            "type": "ExpressionStatement",
            "expression": {
              "type": "BinaryExpression",
              "operator": "+",
              "left": {
                "type": "Literal",
                "value": 10,
                "raw": "10",
                "range": [
                  23,
                  25
                ]
              },
              "right": {
                "type": "Literal",
                "value": 10,
                "raw": "10",
                "range": [
                  28,
                  30
                ]
              },
              "range": [
                23,
                30
              ]
            },
            "range": [
              23,
              30
            ]
          },
          {
            "type": "ExpressionStatement",
            "expression": {
              "type": "BinaryExpression",
              "operator": "*",
              "left": {
                "type": "Literal",
                "value": 10,
                "raw": "10",
                "range": [
                  34,
                  36
                ]
              },
              "right": {
                "type": "Literal",
                "value": 20,
                "raw": "20",
                "range": [
                  39,
                  41
                ]
              },
              "range": [
                34,
                41
              ]
            },
            "range": [
              34,
              41
            ]
          }
        ],
        "range": [
          18,
          43
        ]
      },
      "alternate": null,
      "range": [
        0,
        43
      ]
    }
  ],
  "sourceType": "module",
  "range": [
    0,
    43
  ]
}
Enter fullscreen mode Exit fullscreen mode

You don't need to care about "range" and "raw". They are part of the parser output.

Let's split the JSON down into its part:

If statement

The if statement needs to be in the format:

{
    "type": "IfStatement",
    "test": {
    },
    "consequent": {
    },
    "alternate": null
}
Enter fullscreen mode Exit fullscreen mode

Where "test" and "consequent are any expressions:

if statement condition

The condition can be any expression but here we will have a binary expression that compare two things:

{
  "type": "BinaryExpression",
  "operator": "==",
  "left": {},
  "right": {}
}
Enter fullscreen mode Exit fullscreen mode

Variables

Variables usage looks like this:

{
  "type": "Identifier",
  "name": "foo"
}
Enter fullscreen mode Exit fullscreen mode

Literal string

A literal string that is used in our code looks like this:

{
    "type": "Literal",
    "value": "bar"
}
Enter fullscreen mode Exit fullscreen mode

Block with curly braces

The block inside if is created like this:

{
    "type": "BlockStatement",
    "body": [ ]
}
Enter fullscreen mode Exit fullscreen mode

Whole program

And the whole program is created like this:

{
  "type": "Program",
  "body": [ ]
}
Enter fullscreen mode Exit fullscreen mode

PEG Parser for your own language that compiles to JavaScript

For our demo language we will create code that looks similar to ruby:

if foo == "bar" then
  10 + 10
  10 * 20
end
Enter fullscreen mode Exit fullscreen mode

and we will create AST, which then will create JavaScript code.

Peg grammar for if looks like this:

if = "if" _ expression:(comparison / expression) _ "then" body:(statements / _) _ "end" {
   return {
     "type": "IfStatement",
     "test": expression,
     "consequent": {
        "type": "BlockStatement",
        "body": body
     },
     "alternate": null
   };
}
Enter fullscreen mode Exit fullscreen mode

we have "if" token, then an expression that is comparison or expression and body is statements or white space. _ are optional whitespaces that are ignored.

_ = [ \t\n\r]*
Enter fullscreen mode Exit fullscreen mode

The comparison looks like this:

comparison = _ left:expression _ "==" _ right:expression _ {
   return {
        "type": "BinaryExpression",
        "operator": "==",
        "left": left,
        "right": right
   };
}
Enter fullscreen mode Exit fullscreen mode

The expression looks like this:

expression = expression:(variable / literal) {
   return expression;
}
Enter fullscreen mode Exit fullscreen mode

Variable is created from three rules:

variable = !keywords variable:name {
  return {
    "type": "Identifier",
    "name": variable
  }
}

keywords = "if" / "then" / "end"

name = [A-Z_$a-z][A-Z_a-z0-9]* { return text(); }
Enter fullscreen mode Exit fullscreen mode

Now let's look at statements:

statements = _ head:(if / expression_statement) _ tail:(!"end" _ (if / expression_statement))* {
    return [head].concat(tail.map(function(element) {
        return element[2];
    })); 
  }

expression_statement = expression:expression {
    return  {
      "type": "ExpressionStatement",
      "expression": expression
    };
}
Enter fullscreen mode Exit fullscreen mode

And the last thing are literals:

literal = value:(string / Integer) {
   return {"type": "Literal", "value": value };
}

string = "\"" ([^"] / "\\\\\"")*  "\"" {
  return JSON.parse(text());
}

Integer "integer"
  = _ [0-9]+ { return parseInt(text(), 10); }
Enter fullscreen mode Exit fullscreen mode

Generating JavaScript code

And that is the whole parser, that generates AST. After we have Esprima AST all we have to do, is to generate the code with escodegen.

The code that generates the AST and creates JavaScript code looks like this:

const ast = parser.parse(code);
const js_code = escodegen.generate(ast);
Enter fullscreen mode Exit fullscreen mode

the parser variable is the name that you give when you generate the parser using PEG.js.

And here is a simple demo that I was using to write the parser, you can play with the grammar and generate different syntax for your own programming language that compiles to JavaScript.

Parser Generator Demo.

This simple application save your code in LocalStorage, If it compile without errors, on each change. So you can safely use it to create your own language. But I don't guarantee that you will not lose your work, so you may use something that is more robust.

NOTE: The original PEG.js project is not maintained anymore, but there is a new fork, Peggy that is maintained and it's backward compatible with PEG.js so it will be easy to switch.

If you want something more advanced you can look at Gaiman Programming Language Playground, if you enable dev mode you can edit the grammar and see output JavaScript. You have also module AST where you can see what is the AST output of a given JavaScript Code. This demo uses Peggy.

Conclusion

In this article we used parser generator to create simple custom language to JavaScript compiler. As you can see starting a project like this is not that hard. The techniques explained in this article should allow you to create any programming language that compiles to JavaScript on your own. This can be a way to create a PoC of a language that you want to design. As far as I know, this is the fastest way to have something working. But you can use your language as is and create your own DLS (Domain Specific Language), write code in that language, and make JavaScript do the hard work.

If you like this post, you can follow me on twitter at @jcubic and check my home page.

And here you can find some JavaScript jobs.

đź’– đź’Ş đź™… đźš©
jcubic
Jakub T. Jankiewicz

Posted on May 7, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related