Background
I've always loved learning about Computer Science and using what I've learned to develop tools that enabled people's creativity.
I took a university class on compilers in 2018 and loved every grueling minute of it. Upon finishing the course, I realized that the peak of creative tooling and Computer Science is compiler and language design.
So, in the summer of 2019, I started to outline a project to go beyond what was taught in my compilers class; to develop tools for building compilers.
The Motivation
Writing a compiler can be taxing, and intimidating. The goal was to create an ecosystem that not only simplifies and accelerates compiler development, but lowers its barrier to entry.
Compilation by Configuration
GCT (Generic Compiler Toolchain) is a generalized system. A program is compiled is by feeding in language specification files along with the source program. Language specification files are simple, compact, and are written as plaintext. No fancy tools are needed to create or modify a language's specification - any text editor will do.
GCT uses the language specification files to generate a tokenizer and parser. Python plugin scripts are used as AST visitors to construct semantics, and output code.
Upon completing compilation, an HTML report is generated containing content collected during the compilation process. This report contains anything from tabular data, to fully-interactive models of the tokenizer or AST.
Showcase Language
To demonstrate the capabilities of GCT, I developed a set of configuration files and plugins that can compile a simple high-level programming language to an executable file.
int Fib(int a) {
if (a <= 1) {
return a;
}
return Fib(a - 1) + Fib(a - 2);
}
void main() {
print_int(Fib(12));
}
I've always been interested in exploring LLVM IR and figured this would be a great opportunity by using it in the back-end of the compiler.
The Code
The full source code and implementation of the showcase language is available on the Github page.
Generic Compiler Toolchain
Generic Compiler Toolchain
GC-Toolchain is a compiler development toolchain that is responsible for front-end validation and analysis, and offers a framework for back-end code-generation. A comprehensive compilation report can be generated on each run to give detailed information of the mechanisms involved in the compilation process.
Building and Using GCT
Please visit the Wiki for build and usage instructions.
Lexical Analysis
Configuration Files
Token specification is given through a configuration file. The configuration file is made up of sections, which begin with the section declaration line. The section declaration line begins with the #
symbol, followed by an identifier right next to the section symbol #type
. Some section types require additional information which are usually contained in the remaining trailing whitespace-separated identifiers in the declaration line, or in the section body which are the lines trailing the section declaration line.
#type identifiers
body-line-1
body-line-2
body-line-3
body-line-N
For lexical…
The Internals
For more information on how GCT handles each phase of compilation, you can take a look at the documentation pages here:
Lexical Analysis
Syntactic Analysis
Semantic Analysis and Code Generation
For general usage information, you can reference this page.