🚀 Building Toy ARM64 Emulator
Aakash Apoorv
Posted on May 30, 2024
Hey everyone! 👋
🤔 Ever wondered what it’s like to get really close to the chip level?
Dive into the world of ARM64 by building your own emulator!
Whether you’re into C++, Python, or JavaScript, I’ve got you covered with this super easy-to-follow post 🕹️.
🔧 What You’ll Learn
- Get up close and personal with ARM64 architecture.
- Gain hands-on experience with low-level programming and emulation.
- Build an emulator in your favorite language: C++, Python, or JavaScript.
💡 Why Build an Emulator?
- Learn by Doing.
- Understand the ARM64 architecture.
👨💻 Choose Your Language:
- C++: Perfect for those who love performance and speed.
- Python: Great if you prefer simplicity and readability.
- JavaScript: Awesome for web-based emulation and flexibility.
Features
- Emulates 31 general-purpose registers (x0 to x30).
- Supports basic ARM64 instructions: ldr, str, add, mul, mov, svc, and b.
- Handles memory operations.
- Can print the current state of registers and memory.
Methods
- constructor(): Initializes the emulator with empty registers and memory, and sets the program counter (pc) to 0.
- loadProgram(program): Loads a program into the emulator. The program should be a string of ARM64 assembly instructions.
- run(): Runs the loaded program.
- printMemory(): Prints the current state of the memory.
- printRegisters(): Prints the current state of the registers.
- initializeMemory(memoryInit): Initializes the emulator's memory with the given key-value pairs.
Supported Instructions
- ldr: Loads a value into a register.
- str: Stores a value from a register into memory.
- add: Adds two register values and stores the result in a destination register.
- mul: Multiplies two register values and stores the result in a destination register.
- mov: Moves an immediate value into a register.
- svc: (Not implemented) Placeholder for handling system calls.
- b: Branches to a labeled instruction.
ARM64 Overview
- ARM64 (AArch64) is a 64-bit architecture used in modern processors.
- Supports a large set of registers (x0-x30), each 64 bits wide.
- Designed for high performance and energy efficiency.
Purpose of the Emulator
- Simulate ARM64 instruction execution.
Initializing Emulator
- Constructor initializes registers (x0-x30) to 0.
- Memory and program counter (pc) initialized.
- Instructions and labels are set up for later use.
Initializing Code
cpp
#include <iostream>
#include <unordered_map>
#include <vector>
#include <string>
#include <sstream>
class ARM64Emulator {
private:
std::unordered_map<std::string, int> registers;
std::unordered_map<int, int> memory;
std::vector<std::string> instructions;
std::unordered_map<std::string, int> labels;
int pc;
public:
ARM64Emulator() : pc(0) {
for (int i = 0; i < 31; i++) {
registers["x" + std::to_string(i)] = 0;
}
}
}
python
class ARM64Simulator:
def __init__(self):
self.registers = {f'x{i}': 0 for i in range(31)}
self.memory = {}
self.pc = 0
self.instructions = []
self.labels = {}
javascript
class ARM64Emulator {
constructor() {
this.registers = {};
for (let i = 0; i < 31; i++) {
this.registers[`x${i}`] = 0;
}
this.memory = {};
this.pc = 0;
this.instructions = [];
this.labels = {};
}
}
Loading the Program
- loadProgram(program): Loads the program into the emulator.
- Splits the program into instructions and filters out empty lines.
- Calls parseLabels() to identify labels in the program.
Loading Code
cpp
void loadProgram(const std::string& program) {
std::istringstream stream(program);
std::string line;
while (std::getline(stream, line)) {
std::string trimmed = trim(line);
if (!trimmed.empty()) {
instructions.push_back(trimmed);
}
}
parseLabels();
}
void parseLabels() {
for (size_t i = 0; i < instructions.size(); i++) {
const std::string& line = instructions[i];
size_t colonPos = line.find(':');
if (colonPos != std::string::npos) {
std::string label = trim(line.substr(0, colonPos));
labels[label] = i;
}
}
}
std::string trim(const std::string& str) {
size_t first = str.find_first_not_of(" \t");
size_t last = str.find_last_not_of(" \t");
return (first == std::string::npos || last == std::string::npos) ? "" : str.substr(first, (last - first + 1));
}
python
def load_program(self, program):
self.instructions = [line.strip() for line in program.split('\n') if line.strip()]
self.parse_labels()
def parse_labels(self):
for i, line in enumerate(self.instructions):
if ':' in line:
label = line.split(':')[0].strip()
self.labels[label] = i
javascript
loadProgram(program) {
this.instructions = program.split('\n').map(line => line.trim()).filter(line => line);
this.parseLabels();
}
parseLabels() {
this.instructions.forEach((line, i) => {
if (line.includes(':')) {
const label = line.split(':')[0].trim();
this.labels[label] = i;
}
});
}
Running the Program
- run(): Executes the loaded instructions one by one.
- Skips label lines and calls executeInstruction(line) for each instruction.
Running Code
cpp
void run() {
while (pc < instructions.size()) {
const std::string& line = instructions[pc];
if (line.back() != ':') {
executeInstruction(line);
}
pc++;
}
}
python
def run(self):
while self.pc < len(self.instructions):
line = self.instructions[self.pc]
if not line.endswith(':'):
self.execute_instruction(line)
self.pc += 1
javascript
run() {
while (this.pc < this.instructions.length) {
const line = this.instructions[this.pc];
if (!line.endsWith(':')) {
this.executeInstruction(line);
}
this.pc++;
}
}
Executing Instructions
- executeInstruction(line): Parses and executes a single instruction.
- Supports ldr, str, add, mul, mov, svc, and b instructions.
Executing Code
cpp
void executeInstruction(const std::string& line) {
std::istringstream iss(line);
std::vector<std::string> parts;
std::string part;
while (iss >> part) {
parts.push_back(part);
}
const std::string& cmd = parts[0];
// Handle 'ldr', 'str', 'add', 'mul', 'mov', 'svc', 'b'
}
python
def execute_instruction(self, line):
parts = line.split()
cmd = parts[0]
# Handle 'ldr', 'str', 'add', 'mul', 'mov', 'svc', 'b'
javascript
executeInstruction(line) {
const parts = line.split(/\s+/);
const cmd = parts[0];
switch (cmd) {
// Handle 'ldr', 'str', 'add', 'mul', 'mov', 'svc', 'b'
}
}
LDR and STR Instructions
- ldr: Loads a value into a register.
- str: Stores a value from a register into memory.
LDR Code
cpp
if (cmd == "ldr") {
std::string reg = parts[1].substr(0, parts[1].length() - 1); // remove trailing comma
std::string value = parts[2];
if (value[0] == '=') {
int addr = std::stoi(value.substr(1));
registers[reg] = addr;
} else {
int addr = registers[value.substr(1, value.length() - 2)];
registers[reg] = memory[addr];
}
}
python
if cmd == 'ldr':
reg, value = parts[1].strip(','), parts[2]
if value.startswith('='):
addr = value[1:]
self.registers[reg] = addr
else:
addr = self.registers[value.strip('[]')]
self.registers[reg] = self.memory.get(addr, 0)
javascript
case 'ldr': {
const reg = parts[1].replace(',', '');
const value = parts[2];
if (value.startsWith('=')) {
const addr = value.substring(1);
this.registers[reg] = addr;
} else {
const addr = this.registers[value.replace('[', '').replace(']', '')];
this.registers[reg] = this.memory[addr] || 0;
}
break;
}
STR Code
cpp
else if (cmd == "str") {
std::string value = parts[1].substr(0, parts[1].length() - 1);
std::string reg = parts[2].substr(1, parts[2].length() - 2);
int addr = registers[reg];
memory[addr] = registers[value];
}
python
elif cmd == 'str':
value, reg = parts[1].strip(','), parts[2]
addr = self.registers[reg.strip('[]')]
self.memory[addr] = self.registers[value]
javascript
case 'str': {
const value = parts[1].replace(',', '');
const reg = parts[2];
const addr = this.registers[reg.replace('[', '').replace(']', '')];
this.memory[addr] = this.registers[value];
break;
}
ADD and MUL Instructions
- add: Adds values from two registers and stores the result in a destination register.
- mul: Multiplies values from two registers and stores the result in a destination register.
ADD and MUL Code
cpp
else if (cmd == "add") {
std::string dest = parts[1].substr(0, parts[1].length() - 1);
std::string src1 = parts[2].substr(0, parts[2].length() - 1);
std::string src2 = parts[3];
registers[dest] = registers[src1] + registers[src2];
} else if (cmd == "mul") {
std::string dest = parts[1].substr(0, parts[1].length() - 1);
std::string src1 = parts[2].substr(0, parts[2].length() - 1);
std::string src2 = parts[3];
registers[dest] = registers[src1] * registers[src2];
}
python
elif cmd == 'add':
dest, src1, src2 = parts[1].strip(','), parts[2].strip(','), parts[3]
self.registers[dest] = self.registers[src1] + self.registers[src2]
elif cmd == 'mul':
dest, src1, src2 = parts[1].strip(','), parts[2].strip(','), parts[3]
self.registers[dest] = self.registers[src1] * self.registers[src2]
javascript
case 'add': {
const dest = parts[1].replace(',', '');
const src1 = parts[2].replace(',', '');
const src2 = parts[3];
this.registers[dest] = this.registers[src1] + this.registers[src2];
break;
}
case 'mul': {
const dest = parts[1].replace(',', '');
const src1 = parts[2].replace(',', '');
const src2 = parts[3];
this.registers[dest] = this.registers[src1] * this.registers[src2];
break;
}
MOV and B Instructions
- mov: Moves an immediate value into a register.
- b: Branches to a labeled instruction.
MOV and B Code
cpp
else if (cmd == "mov") {
std::string reg = parts[1].substr(0, parts[1].length() - 1);
int value = std::stoi(parts[2].substr(1));
registers[reg] = value;
} else if (cmd == "b") {
std::string label = parts[1];
pc = labels[label] - 1;
} else {
std::cout << "Unknown instruction: " << cmd << std::endl;
}
python
elif cmd == 'mov':
reg, value = parts[1].strip(','), int(parts[2].strip('#'))
self.registers[reg] = value
elif cmd == 'svc':
pass # We will handle syscall separately
elif cmd == 'b':
label = parts[1]
self.pc = self.labels[label] - 1
else:
print(f"Unknown instruction: {cmd}")
javascript
case 'mov': {
const reg = parts[1].replace(',', '');
const value = parseInt(parts[2].replace('#', ''));
this.registers[reg] = value;
break;
}
case 'svc': {
// Handle syscall separately
break;
}
case 'b': {
const label = parts[1];
this.pc = this.labels[label] - 1;
break;
}
default: {
console.log(`Unknown instruction: ${cmd}`);
break;
}
Memory and Register Handling
- initializeMemory(memoryInit): Initializes memory with given values.
- printMemory(): Prints the current state of memory.
- printRegisters(): Prints the current state of registers.
Memory and Register Code
cpp
void printMemory() {
std::cout << "Memory:" << std::endl;
for (const auto& [k, v] : memory) {
std::cout << k << ": " << v << std::endl;
}
}
void printRegisters() {
std::cout << "Registers:" << std::endl;
for (const auto& [k, v] : registers) {
std::cout << k << ": " << v << std::endl;
}
}
void initializeMemory(const std::unordered_map<std::string, int>& memoryInit) {
for (const auto& [key, value] : memoryInit) {
memory[std::stoi(key)] = value;
}
}
python
def print_memory(self):
print("Memory:")
for k, v in self.memory.items():
print(f"{k}: {v}")
def print_registers(self):
print("Registers:")
for k, v in self.registers.items():
print(f"{k}: {v}")
def initialize_memory(self, memory_init):
for var, value in memory_init.items():
self.memory[var] = value
javascript
printMemory() {
console.log("Memory:");
for (const [k, v] of Object.entries(this.memory)) {
console.log(`${k}: ${v}`);
}
}
printRegisters() {
console.log("Registers:");
for (const [k, v] of Object.entries(this.registers)) {
console.log(`${k}: ${v}`);
}
}
initializeMemory(memoryInit) {
this.memory = { ...memoryInit };
}
Putting It All Together
- Define the program to be executed.
- Initialize memory with values.
- Create emulator instance, load program, run, and print results.
Driver Code
cpp
int main() {
std::string program =
"ldr x0, =5\n"
"ldr x1, [x0]\n"
"ldr x0, =7\n"
"ldr x2, [x0]\n"
"add x3, x1, x2\n"
"ldr x0, =3\n"
"ldr x4, [x0]\n"
"mul x5, x3, x4\n"
"ldr x0, =0\n"
"str x5, [x0]\n";
std::unordered_map<std::string, int> memoryInit = {
{"5", 5},
{"7", 7},
{"3", 3},
{"0", 0}
};
ARM64Emulator emulator;
emulator.initializeMemory(memoryInit);
emulator.loadProgram(program);
emulator.run();
emulator.printRegisters();
emulator.printMemory();
return 0;
}
python
program = """
ldr x0, =num1
ldr x1, [x0]
ldr x0, =num2
ldr x2, [x0]
add x3, x1, x2
ldr x0, =multiplier
ldr x4, [x0]
mul x5, x3, x4
ldr x0, =result
str x5, [x0]
"""
memory_init = {
'num1': 5,
'num2': 7,
'multiplier': 3,
'result': 0
}
simulator = ARM64Simulator()
simulator.initialize_memory(memory_init)
simulator.load_program(program)
simulator.run()
simulator.print_registers()
simulator.print_memory()
javascript
const program = `
ldr x0, =num1
ldr x1, [x0]
ldr x0, =num2
ldr x2, [x0]
add x3, x1, x2
ldr x0, =multiplier
ldr x4, [x0]
mul x5, x3, x4
ldr x0, =result
str x5, [x0]
`;
const memoryInit = {
'num1': 5,
'num2': 7,
'multiplier': 3,
'result': 0
};
const emulator = new ARM64Emulator();
emulator.initializeMemory(memoryInit);
emulator.loadProgram(program);
emulator.run();
emulator.printRegisters();
emulator.printMemory();
Future Work
The journey doesn't end here! Building a simple emulator is just the beginning. You can explore advanced instruction sets with following tasks:-
- Implement additional ARM64 instructions to enhance your emulator’s capabilities.
- Explore conditional instructions, floating-point operations, and vector processing.
GitHub
💖 💪 🙅 🚩
Aakash Apoorv
Posted on May 30, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.