Implement an emulator for a fantasy CPU in JavaScript

codeguppy

Adrian

Posted on August 22, 2020

Implement an emulator for a fantasy CPU in JavaScript

YouTube video

Before we begin our article, I want to announce that I was able to record this in a short youtube video:

You can find it on the "Coding Adventures" channel:

https://www.youtube.com/watch?v=ghwGfaOM00s

Introduction

Emulators are cool pieces of technology that allow users to run a totally different system on top of another one.

Emulators have wide range of applications such as running x86 programs on ARM devices or running ARM Android applications on x86 Windows Desktops and even running your favorite retro game on an emulated computer / console on a Raspberry Pi.

When a full system is emulated, the emulator software needs to handle all the hardware devices of that systems. These may include not only the CPU but also Video System, Input Devices, etc. However, the core concept of an emulator remains emulation of the CPU.

In this article we will explore how a CPU works and how can be emulated in software by implementing a simple machine that will run programs written for a fantasy CPU.

Programs are just a series of bytes

It is well known that CPUs are the core of a machine and their main job is to execute programs. Programs are nothing else but a series of instructions in the computer memory.

At this point you may be tempted to believe that your CPU knows JavaScript. Although a very tempting concept, this is not exactly the case. Imaging changing the CPU if a new version of JavaScript appeared!

The reality is that a CPU understands only a finite set of instructions. These are the instructions that the CPU engineering team designed the CPU to understand.

As a user you can create programs by using these instructions directly or by coding in a higher-level language and then using a compiler / system to translate your program into the set of instructions understood by the CPU.

Regardless how the user created the program, the instructions end up in memory as a series of bytes such as these ones:

11,0,10,42,6,255,30,0,11,0,0,11,1,1,11,3,1,60,1,10,2,0,20,
2,1,60,2,10,0,1,10,1,2,11,2,1,20,3,2,31,2,30,2,41,3,2,19,31,0,50
Enter fullscreen mode Exit fullscreen mode

The CPU role is to fetch these instructions from memory and to execute them.

At this point please look again at the series of number above. That is a real program written for our fantasy CPU. To see what it does we need to implement a virtual CPU in software and then we will use it execute the program.

Fantasy CPU high-level specs

The CPU that we’re about to emulate in software is a fantasy one. It does not exist in the real world, but it mimics very closely how a real CPU works. The machine is partially inspired by this article.

The CPU has 4 general purpose number registers: R0, R1, R2, R3 (think of these registers as variables that can store a number). Besides these registers, the CPU can also manipulate a stack with ability to push or pop values on and out of the stack.

The CPU operates by instructions. Some of the instructions don't have operands, while other instructions have several operands (think of operands as arguments).

A series of instructions makes a program. Instructions are encoded in the program as this:

INSTRUCTION1 [OPERAND1] [OPERAND2] [OPERAND3] INSTRUCTION2 [OPERAND1] ...
Enter fullscreen mode Exit fullscreen mode

Each instruction has a unique number associated with it. For simplicity, instruction codes, operands and even addresses are regular numbers. Therefore, no bytes, or any other data types are needed. Everything is a number!

Our program is therefore a series of numbers. Each number occupies a single cell of memory. For example, an instruction with 3 operands will take 4 cells of program memory (1 for the instruction code and 3 for operands).

Instructions

And now let's see the set of the instructions that our CPU accepts. Although all CPUs including our fantasy one executes instructions in binary form, the CPU engineering team usually associates names / mnemonics with the instructions recognized by the CPU.

Using mnemonics makes the task of writing programs much easier for humans. If one writes programs using the instructions mnemonics it is said that he codes in assembly language. It only takes a simple step to transform these programs written in assembly language into binary instructions (e.g. machine language).

Instructions and mnemonics for our fantasy CPU are quite simple and intuitive. Let’s take the time to describe all of the below. Afterall we need to know all the CPU instructions if we want to emulate that CPU into software.

Note: Under each instruction is specified the number used to encode that particular instruction. Also registers R0, R1, R2, R3 are encoded as numbers 0, 1, 2, 3):

Loads the value from regsrc into regdst. E.g. regdst = regsrc

MOVR reg_dst, reg_src
MOVR = 10
Enter fullscreen mode Exit fullscreen mode

Loads the numeric value into register regdst. E.g. regdst = value

MOVV reg_dst, value
MOVV = 11
Enter fullscreen mode Exit fullscreen mode

Adds the value from regsrc to the value of regdst and store the result in reg_dst

ADD reg_dst, reg_src
ADD = 20
Enter fullscreen mode Exit fullscreen mode

Subtracts the value of regsrc from the value of regdst and store the result in reg_dst

SUB reg_dst, reg_src
SUB = 21
Enter fullscreen mode Exit fullscreen mode

Pushes the value of reg_src on the stack

PUSH reg_src
PUSH = 30
Enter fullscreen mode Exit fullscreen mode

Pops the last value from stack and loads it into register reg_dst

POP reg_dst
POP = 31
Enter fullscreen mode Exit fullscreen mode

Jumps the execution to address addr. Similar to a GOTO!

JP addr
JP = 40
Enter fullscreen mode Exit fullscreen mode

Jump to the address addr only if the value from reg1 < reg2 (IF reg1 < reg2 THEN JP addr)

JL reg_1, reg_2, addr
JL = 41
Enter fullscreen mode Exit fullscreen mode

Pushes onto the stack the address of instruction that follows CALL and then jumps to address addr

CALL addr
CALL = 42
Enter fullscreen mode Exit fullscreen mode

Pops from the stack the last number, assumes is an address and jump to that address

RET
RET = 50
Enter fullscreen mode Exit fullscreen mode

Print on the screen the value contained in the register reg

PRINT reg
PRINT = 60
Enter fullscreen mode Exit fullscreen mode

Stops our VM. The virtual CPU doesn't execute instructions once HALT is encountered.

HALT
HALT = 255
Enter fullscreen mode Exit fullscreen mode

Emulating the CPU

Armed with the fantasy CPU specs, we can now move ahead to emulate the CPU in software – in JavaScript.

As presented above, on real machines programs are stored in memory. In our emulator will emulate the memory using a simple array structure. As a matter of fact, we’ll only place a single program in the memory.

let program = [11,0,10,42,6,255,30,0,11,0,0,11,1,1,11,3,1,60,1,10,2,0,20,
    2,1,60,2,10,0,1,10,1,2,11,2,1,20,3,2,31,2,30,2,41,3,2,19,31,0,50];
Enter fullscreen mode Exit fullscreen mode

Our virtual CPU will need to fetch instructions one by one from this array and execute them. The CPU will keep track of the instruction that needs to fetch by using a special-purpose register named “PC” (Program Counter).

Fact: PC register is a real register on many physical CPU architectures.

The core of the CPU emulator is just a big “switch” statement that will process each instruction according to the specs.

let pc = 0;
let halted = false;    

function run()
{
    while(!halted)
    {
        runone();
    }
}


function runone()
{
    if (halted)
        return;

    let instr = program[pc];

    switch(instr)
    {
        // handle each instruction according to specs
        // also advance pc to prepare for the next fetch
        // ...
    }
}
Enter fullscreen mode Exit fullscreen mode

That’s all! This is the structure of our glorious CPU emulator. Processing instructions is also a very simple task. Require just careful reading and implementation of the instruction’s specs.

switch(instr)
{
    // movr rdst, rsrc
    case 10:
        pc++;
        var rdst = program[pc++];
        var rsrc = program[pc++];
        regs[rdst] = regs[rsrc];
        break;

    // movv rdst, val
    case 11:
        pc++;
        var rdst = program[pc++];
        var val = program[pc++];
        regs[rdst] = val;
        break;

    ...

}
Enter fullscreen mode Exit fullscreen mode

Take your time, read the specifications, and try to see if you can complete this virtual CPU implementation. You’ll know that you did a great job when you’ll see the results of the execution program.

If you coded carefully, the program should display the first 10 Fibonacci numbers.

You can also consult our version from this playground:

https://codeguppy.com/code.html?simple_vm/vm0_fibonacci

Loading the program

As you saw in the above playground our emulator implementation consists in a simple virtual machine that loads initially the program from an array of bytes and then asks the fantasy CPU to execute it.

vm.load([11,0,10,42,6,255,30,0,11,0,0,11,1,1,11,3,1,60,1,10,2,0,20,
   2,1,60,2,10,0,1,10,1,2,11,2,1,20,3,2,31,2,30,2,41,3,2,19,31,0,50]);
Enter fullscreen mode Exit fullscreen mode

Loading like this is fine if your only intent is to load the program we’ve provided. But if you want to design your own programs in machine language and execute them, this may not be a very intuitive or efficient way.

Let me present you a small technique that you can use to load the programs using a human friendly CPU instruction mnemonics. It only takes defining a few constants:

const MOVR = 10;
const MOVV = 11;
const ADD  = 20;
const SUB  = 21;
const PUSH = 30;
const POP  = 31;
const JP   = 40;
const JL   = 41;
const CALL = 42;
const RET  = 50;
const PRINT= 60;
const HALT = 255;

const R0 = 0;
const R1 = 1;
const R2 = 2;
const R3 = 3;

vm.load([
    MOVV, R0, 10,
    CALL, 6,    
    HALT,

    // PrintFibo: (addr = 6)
    PUSH, R0,

    MOVV, R0, 0,
    MOVV, R1, 1,
    MOVV, R3, 1,
    PRINT, R1,

    // continue: (addr = 19)
    MOVR, R2, R0,
    ADD, R2, R1,
    PRINT, R2,

    MOVR, R0, R1,
    MOVR, R1, R2,

    MOVV, R2, 1,
    ADD, R3, R2,

    POP, R2,
    PUSH, R2,

    JL, R3, R2, 19,

    POP, R0,

    RET
]);
Enter fullscreen mode Exit fullscreen mode

With this trick is like you’re programming in assembly language inside JavaScript.

You can find the new program with mnemonics in this playground:

https://codeguppy.com/code.html?simple_vm/vm1_fibonacci

At this point we can end our article. We will take however the opportunity to quickly implement a few small tools that are commonly used when working with machine language code.

Assembler

Perhaps the most important tool when working with machine language programs is an assembler program.

This program lets users write programs as text files, in easier to use assembly language and then the assembler will do the heavy lifting by converting the assembly source code into binary data understood by the CPU.

We will attempt to build a simplified assembler program that does the basic job. It will work like this:

let code = `
// Loads value 10 in R0 
// and calls Fibonacci routine

MOVV R0, 10
CALL 6
HALT
...
`;

let bytes = asm.assemble(code);
Enter fullscreen mode Exit fullscreen mode

If you want to see the code, please check this playground:

https://codeguppy.com/code.html?simple_vm/vm3_assembler

Disassembler

Another useful tool that you can build when working with machine language is a disassembler.

A disassembler takes as input a program in binary format and outputs its listing in a human readable way – in assembly language.

When run on the following program:

let src = asm.disassemble(
    [11,0,10,42,6,255,30,0,11,0,0,11,1,1,11,3,1,60,1,10,2,0,20,
    2,1,60,2,10,0,1,10,1,2,11,2,1,20,3,2,31,2,30,2,41,3,2,19,31,0,50]
);
Enter fullscreen mode Exit fullscreen mode

... our disassembler should output this:

0   11 0 10         MOVV R0, 10
3   42 6            CALL 6
5   255             HALT
6   30 0            PUSH R0
8   11 0 0          MOVV R0, 0
11  11 1 1          MOVV R1, 1
14  11 3 1          MOVV R3, 1
17  60 6            PRINT R1
19  10 2 0          MOVR R2, R0
22  20 2 1          ADD R2, R1
25  60 6            PRINT R2
27  10 0 1          MOVR R0, R1
30  10 1 2          MOVR R1, R2
33  11 2 1          MOVV R2, 1
36  20 3 2          ADD R3, R2
39  31 2 2          POP R2
41  30 2            PUSH R2
43  41 3 2 19       JL R3, R2, 19
47  31 0 2          POP R0
49  50              RET
Enter fullscreen mode Exit fullscreen mode

The listing contains the memory addresses and binary instructions and associated mnemonic for each instruction in our program.

To see a very simplistic implementation of a disassembler, please check this playground:

https://codeguppy.com/code.html?simple_vm/vm2_disassembler

About Fibonacci program

You're probably wondering how we came up with the assembly program to print the Fibonacci numbers? Well, the answer is simple. We wrote first the algorithm in JavaScript and then convert it step by step to assembler:

Of course, once you get more experience with writing in assembly, you can do this task directly in one step! It just requires practice!

Conclusion

I hope you liked this article! Even if there is still a lot of work until you can emulate a full system, I hope that this article provided you with a basic overview of what it take to emulate the core of a system – the CPU.

For a quick reference I put together all programs in this article and a few explanations in the following playground:

https://codeguppy.com/code.html?simple_vm/vm

As a next step I invite you to create additional programs for this fantasy CPU and if needed extend the CPU instruction set with new instructions to support your programs.

Looking forward to your feedback and comments. Please let me know if you want to see more about this subject, perhaps even seeing the implementation of a full emulator. For more recreational coding programs please feel free to browse codeguppy.com

💖 💪 🙅 🚩
codeguppy
Adrian

Posted on August 22, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related