🚀 Building Toy ARM64 Emulator

dotproduct

Aakash Apoorv

Posted on May 30, 2024

🚀 Building Toy ARM64 Emulator

Hey everyone! 👋

🤔 Ever wondered what it’s like to get really close to the chip level?
Dive into the world of ARM64 by building your own emulator!

Whether you’re into C++, Python, or JavaScript, I’ve got you covered with this super easy-to-follow post 🕹️.

🔧 What You’ll Learn

  • Get up close and personal with ARM64 architecture.
  • Gain hands-on experience with low-level programming and emulation.
  • Build an emulator in your favorite language: C++, Python, or JavaScript.

💡 Why Build an Emulator?

  • Learn by Doing.
  • Understand the ARM64 architecture.

👨‍💻 Choose Your Language:

  • C++: Perfect for those who love performance and speed.
  • Python: Great if you prefer simplicity and readability.
  • JavaScript: Awesome for web-based emulation and flexibility.

Features

  • Emulates 31 general-purpose registers (x0 to x30).
  • Supports basic ARM64 instructions: ldr, str, add, mul, mov, svc, and b.
  • Handles memory operations.
  • Can print the current state of registers and memory.

Methods

  • constructor(): Initializes the emulator with empty registers and memory, and sets the program counter (pc) to 0.
  • loadProgram(program): Loads a program into the emulator. The program should be a string of ARM64 assembly instructions.
  • run(): Runs the loaded program.
  • printMemory(): Prints the current state of the memory.
  • printRegisters(): Prints the current state of the registers.
  • initializeMemory(memoryInit): Initializes the emulator's memory with the given key-value pairs.

Supported Instructions

  • ldr: Loads a value into a register.
  • str: Stores a value from a register into memory.
  • add: Adds two register values and stores the result in a destination register.
  • mul: Multiplies two register values and stores the result in a destination register.
  • mov: Moves an immediate value into a register.
  • svc: (Not implemented) Placeholder for handling system calls.
  • b: Branches to a labeled instruction.

ARM64 Overview

  • ARM64 (AArch64) is a 64-bit architecture used in modern processors.
  • Supports a large set of registers (x0-x30), each 64 bits wide.
  • Designed for high performance and energy efficiency.

Purpose of the Emulator

  • Simulate ARM64 instruction execution.

Initializing Emulator

  • Constructor initializes registers (x0-x30) to 0.
  • Memory and program counter (pc) initialized.
  • Instructions and labels are set up for later use.

Initializing Code

cpp

#include <iostream>
#include <unordered_map>
#include <vector>
#include <string>
#include <sstream>

class ARM64Emulator {
private:
    std::unordered_map<std::string, int> registers;
    std::unordered_map<int, int> memory;
    std::vector<std::string> instructions;
    std::unordered_map<std::string, int> labels;
    int pc;

public:
    ARM64Emulator() : pc(0) {
        for (int i = 0; i < 31; i++) {
            registers["x" + std::to_string(i)] = 0;
        }
    }

}
Enter fullscreen mode Exit fullscreen mode

python

class ARM64Simulator:
    def __init__(self):
        self.registers = {f'x{i}': 0 for i in range(31)}
        self.memory = {}
        self.pc = 0
        self.instructions = []
        self.labels = {}
Enter fullscreen mode Exit fullscreen mode

javascript

class ARM64Emulator {
    constructor() {
        this.registers = {};
        for (let i = 0; i < 31; i++) {
            this.registers[`x${i}`] = 0;
        }
        this.memory = {};
        this.pc = 0;
        this.instructions = [];
        this.labels = {};
    }

}
Enter fullscreen mode Exit fullscreen mode

Loading the Program

  • loadProgram(program): Loads the program into the emulator.
  • Splits the program into instructions and filters out empty lines.
  • Calls parseLabels() to identify labels in the program.

Loading Code

cpp

    void loadProgram(const std::string& program) {
        std::istringstream stream(program);
        std::string line;
        while (std::getline(stream, line)) {
            std::string trimmed = trim(line);
            if (!trimmed.empty()) {
                instructions.push_back(trimmed);
            }
        }
        parseLabels();
    }

    void parseLabels() {
        for (size_t i = 0; i < instructions.size(); i++) {
            const std::string& line = instructions[i];
            size_t colonPos = line.find(':');
            if (colonPos != std::string::npos) {
                std::string label = trim(line.substr(0, colonPos));
                labels[label] = i;
            }
        }
    }

    std::string trim(const std::string& str) {
        size_t first = str.find_first_not_of(" \t");
        size_t last = str.find_last_not_of(" \t");
        return (first == std::string::npos || last == std::string::npos) ? "" : str.substr(first, (last - first + 1));
    }
Enter fullscreen mode Exit fullscreen mode

python

    def load_program(self, program):
        self.instructions = [line.strip() for line in program.split('\n') if line.strip()]
        self.parse_labels()

    def parse_labels(self):
        for i, line in enumerate(self.instructions):
            if ':' in line:
                label = line.split(':')[0].strip()
                self.labels[label] = i
Enter fullscreen mode Exit fullscreen mode

javascript

    loadProgram(program) {
        this.instructions = program.split('\n').map(line => line.trim()).filter(line => line);
        this.parseLabels();
    }

    parseLabels() {
        this.instructions.forEach((line, i) => {
            if (line.includes(':')) {
                const label = line.split(':')[0].trim();
                this.labels[label] = i;
            }
        });
    }
Enter fullscreen mode Exit fullscreen mode

Running the Program

  • run(): Executes the loaded instructions one by one.
  • Skips label lines and calls executeInstruction(line) for each instruction.

Running Code

cpp

    void run() {
        while (pc < instructions.size()) {
            const std::string& line = instructions[pc];
            if (line.back() != ':') {
                executeInstruction(line);
            }
            pc++;
        }
    }
Enter fullscreen mode Exit fullscreen mode

python

    def run(self):
        while self.pc < len(self.instructions):
            line = self.instructions[self.pc]
            if not line.endswith(':'):
                self.execute_instruction(line)
            self.pc += 1
Enter fullscreen mode Exit fullscreen mode

javascript

    run() {
        while (this.pc < this.instructions.length) {
            const line = this.instructions[this.pc];
            if (!line.endsWith(':')) {
                this.executeInstruction(line);
            }
            this.pc++;
        }
    }
Enter fullscreen mode Exit fullscreen mode

Executing Instructions

  • executeInstruction(line): Parses and executes a single instruction.
  • Supports ldr, str, add, mul, mov, svc, and b instructions.

Executing Code

cpp

    void executeInstruction(const std::string& line) {
        std::istringstream iss(line);
        std::vector<std::string> parts;
        std::string part;
        while (iss >> part) {
            parts.push_back(part);
        }

        const std::string& cmd = parts[0];
        // Handle 'ldr', 'str', 'add', 'mul', 'mov', 'svc', 'b'

    }
Enter fullscreen mode Exit fullscreen mode

python

    def execute_instruction(self, line):
        parts = line.split()
        cmd = parts[0]

        # Handle 'ldr', 'str', 'add', 'mul', 'mov', 'svc', 'b'
Enter fullscreen mode Exit fullscreen mode

javascript

    executeInstruction(line) {
        const parts = line.split(/\s+/);
        const cmd = parts[0];

        switch (cmd) {
            // Handle 'ldr', 'str', 'add', 'mul', 'mov', 'svc', 'b'
        }
    }
Enter fullscreen mode Exit fullscreen mode

LDR and STR Instructions

  • ldr: Loads a value into a register.
  • str: Stores a value from a register into memory.

LDR Code

cpp

        if (cmd == "ldr") {
            std::string reg = parts[1].substr(0, parts[1].length() - 1); // remove trailing comma
            std::string value = parts[2];
            if (value[0] == '=') {
                int addr = std::stoi(value.substr(1));
                registers[reg] = addr;
            } else {
                int addr = registers[value.substr(1, value.length() - 2)];
                registers[reg] = memory[addr];
            }
        }
Enter fullscreen mode Exit fullscreen mode

python

        if cmd == 'ldr':
            reg, value = parts[1].strip(','), parts[2]
            if value.startswith('='):
                addr = value[1:]
                self.registers[reg] = addr
            else:
                addr = self.registers[value.strip('[]')]
                self.registers[reg] = self.memory.get(addr, 0)
Enter fullscreen mode Exit fullscreen mode

javascript

            case 'ldr': {
                const reg = parts[1].replace(',', '');
                const value = parts[2];
                if (value.startsWith('=')) {
                    const addr = value.substring(1);
                    this.registers[reg] = addr;
                } else {
                    const addr = this.registers[value.replace('[', '').replace(']', '')];
                    this.registers[reg] = this.memory[addr] || 0;
                }
                break;
            }
Enter fullscreen mode Exit fullscreen mode

STR Code

cpp

          else if (cmd == "str") {
            std::string value = parts[1].substr(0, parts[1].length() - 1);
            std::string reg = parts[2].substr(1, parts[2].length() - 2);
            int addr = registers[reg];
            memory[addr] = registers[value];
        } 
Enter fullscreen mode Exit fullscreen mode

python

        elif cmd == 'str':
            value, reg = parts[1].strip(','), parts[2]
            addr = self.registers[reg.strip('[]')]
            self.memory[addr] = self.registers[value]
Enter fullscreen mode Exit fullscreen mode

javascript

            case 'str': {
                const value = parts[1].replace(',', '');
                const reg = parts[2];
                const addr = this.registers[reg.replace('[', '').replace(']', '')];
                this.memory[addr] = this.registers[value];
                break;
            }
Enter fullscreen mode Exit fullscreen mode

ADD and MUL Instructions

  • add: Adds values from two registers and stores the result in a destination register.
  • mul: Multiplies values from two registers and stores the result in a destination register.

ADD and MUL Code

cpp

          else if (cmd == "add") {
            std::string dest = parts[1].substr(0, parts[1].length() - 1);
            std::string src1 = parts[2].substr(0, parts[2].length() - 1);
            std::string src2 = parts[3];
            registers[dest] = registers[src1] + registers[src2];
        } else if (cmd == "mul") {
            std::string dest = parts[1].substr(0, parts[1].length() - 1);
            std::string src1 = parts[2].substr(0, parts[2].length() - 1);
            std::string src2 = parts[3];
            registers[dest] = registers[src1] * registers[src2];
        }
Enter fullscreen mode Exit fullscreen mode

python

        elif cmd == 'add':
            dest, src1, src2 = parts[1].strip(','), parts[2].strip(','), parts[3]
            self.registers[dest] = self.registers[src1] + self.registers[src2]
        elif cmd == 'mul':
            dest, src1, src2 = parts[1].strip(','), parts[2].strip(','), parts[3]
            self.registers[dest] = self.registers[src1] * self.registers[src2]
Enter fullscreen mode Exit fullscreen mode

javascript

            case 'add': {
                const dest = parts[1].replace(',', '');
                const src1 = parts[2].replace(',', '');
                const src2 = parts[3];
                this.registers[dest] = this.registers[src1] + this.registers[src2];
                break;
            }
            case 'mul': {
                const dest = parts[1].replace(',', '');
                const src1 = parts[2].replace(',', '');
                const src2 = parts[3];
                this.registers[dest] = this.registers[src1] * this.registers[src2];
                break;
            }
Enter fullscreen mode Exit fullscreen mode

MOV and B Instructions

  • mov: Moves an immediate value into a register.
  • b: Branches to a labeled instruction.

MOV and B Code

cpp

          else if (cmd == "mov") {
            std::string reg = parts[1].substr(0, parts[1].length() - 1);
            int value = std::stoi(parts[2].substr(1));
            registers[reg] = value;
        } else if (cmd == "b") {
            std::string label = parts[1];
            pc = labels[label] - 1;
        } else {
            std::cout << "Unknown instruction: " << cmd << std::endl;
        }
Enter fullscreen mode Exit fullscreen mode

python

        elif cmd == 'mov':
            reg, value = parts[1].strip(','), int(parts[2].strip('#'))
            self.registers[reg] = value
        elif cmd == 'svc':
            pass  # We will handle syscall separately
        elif cmd == 'b':
            label = parts[1]
            self.pc = self.labels[label] - 1
        else:
            print(f"Unknown instruction: {cmd}")
Enter fullscreen mode Exit fullscreen mode

javascript

            case 'mov': {
                const reg = parts[1].replace(',', '');
                const value = parseInt(parts[2].replace('#', ''));
                this.registers[reg] = value;
                break;
            }
            case 'svc': {
                // Handle syscall separately
                break;
            }
            case 'b': {
                const label = parts[1];
                this.pc = this.labels[label] - 1;
                break;
            }
            default: {
                console.log(`Unknown instruction: ${cmd}`);
                break;
            }
Enter fullscreen mode Exit fullscreen mode

Memory and Register Handling

  • initializeMemory(memoryInit): Initializes memory with given values.
  • printMemory(): Prints the current state of memory.
  • printRegisters(): Prints the current state of registers.

Memory and Register Code

cpp

    void printMemory() {
        std::cout << "Memory:" << std::endl;
        for (const auto& [k, v] : memory) {
            std::cout << k << ": " << v << std::endl;
        }
    }

    void printRegisters() {
        std::cout << "Registers:" << std::endl;
        for (const auto& [k, v] : registers) {
            std::cout << k << ": " << v << std::endl;
        }
    }

    void initializeMemory(const std::unordered_map<std::string, int>& memoryInit) {
        for (const auto& [key, value] : memoryInit) {
            memory[std::stoi(key)] = value;
        }
    }
Enter fullscreen mode Exit fullscreen mode

python

    def print_memory(self):
        print("Memory:")
        for k, v in self.memory.items():
            print(f"{k}: {v}")

    def print_registers(self):
        print("Registers:")
        for k, v in self.registers.items():
            print(f"{k}: {v}")

    def initialize_memory(self, memory_init):
        for var, value in memory_init.items():
            self.memory[var] = value
Enter fullscreen mode Exit fullscreen mode

javascript

    printMemory() {
        console.log("Memory:");
        for (const [k, v] of Object.entries(this.memory)) {
            console.log(`${k}: ${v}`);
        }
    }

    printRegisters() {
        console.log("Registers:");
        for (const [k, v] of Object.entries(this.registers)) {
            console.log(`${k}: ${v}`);
        }
    }

    initializeMemory(memoryInit) {
        this.memory = { ...memoryInit };
    }
Enter fullscreen mode Exit fullscreen mode

Putting It All Together

  • Define the program to be executed.
  • Initialize memory with values.
  • Create emulator instance, load program, run, and print results.

Driver Code

cpp

int main() {
    std::string program = 
        "ldr x0, =5\n"
        "ldr x1, [x0]\n"
        "ldr x0, =7\n"
        "ldr x2, [x0]\n"
        "add x3, x1, x2\n"
        "ldr x0, =3\n"
        "ldr x4, [x0]\n"
        "mul x5, x3, x4\n"
        "ldr x0, =0\n"
        "str x5, [x0]\n";

    std::unordered_map<std::string, int> memoryInit = {
        {"5", 5},
        {"7", 7},
        {"3", 3},
        {"0", 0}
    };

    ARM64Emulator emulator;
    emulator.initializeMemory(memoryInit);
    emulator.loadProgram(program);
    emulator.run();
    emulator.printRegisters();
    emulator.printMemory();

    return 0;
}
Enter fullscreen mode Exit fullscreen mode

python

program = """
ldr x0, =num1
ldr x1, [x0]
ldr x0, =num2
ldr x2, [x0]
add x3, x1, x2
ldr x0, =multiplier
ldr x4, [x0]
mul x5, x3, x4
ldr x0, =result
str x5, [x0]
"""

memory_init = {
    'num1': 5,
    'num2': 7,
    'multiplier': 3,
    'result': 0
}

simulator = ARM64Simulator()
simulator.initialize_memory(memory_init)
simulator.load_program(program)
simulator.run()
simulator.print_registers()
simulator.print_memory()
Enter fullscreen mode Exit fullscreen mode

javascript

const program = `
ldr x0, =num1
ldr x1, [x0]
ldr x0, =num2
ldr x2, [x0]
add x3, x1, x2
ldr x0, =multiplier
ldr x4, [x0]
mul x5, x3, x4
ldr x0, =result
str x5, [x0]
`;

const memoryInit = {
    'num1': 5,
    'num2': 7,
    'multiplier': 3,
    'result': 0
};

const emulator = new ARM64Emulator();
emulator.initializeMemory(memoryInit);
emulator.loadProgram(program);
emulator.run();
emulator.printRegisters();
emulator.printMemory();
Enter fullscreen mode Exit fullscreen mode

Future Work

The journey doesn't end here! Building a simple emulator is just the beginning. You can explore advanced instruction sets with following tasks:-

  • Implement additional ARM64 instructions to enhance your emulator’s capabilities.
  • Explore conditional instructions, floating-point operations, and vector processing.

GitHub

https://github.com/ToyMath/ToyARM64Emulator

💖 💪 🙅 🚩
dotproduct
Aakash Apoorv

Posted on May 30, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

🚀 Building Toy ARM64 Emulator
javascript 🚀 Building Toy ARM64 Emulator

May 30, 2024