To VBA and beyond - building a RESTful backend using plain Microsoft Excel macros
Michael Neu
Posted on October 7, 2017
When my coworkers and I discussed backend technologies for an upcoming project, someone jokingly mentioned Excel as people widely misused it as a terrible database replacement. Although we settled for .NET, the idea of using Excel as a backend fascinated me. Since I just recently finished my bachelor's thesis and had some spare time, I thought I'd give it a shot and see how far I'd get.
This article consists of three main parts, an introduction to webserver internals, getting Excel to answer http requests and adding some special sauce to make it a RESTful backend. If you can't wait to read through the source, you can check out the repository webxcel, which I will refer to throughout this article.
NEW: Webxcel also supports PHP - see Excel'ing at PHP.
Webservers
When using express or ASP.NET, we usually think about http routes and request bodies, but we never really care about how requests are handled inside the framework. Deep down, every http request consists of a TCP connection, where the client and server exchange various messages and eventually close the connection. A very basic request might look like this:
POST /api/cities HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (Windows; U; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)
Content-Type: application/json
Content-Length: 29
{
"name": "Springfield"
}
This post request sends a JSON payload to /api/cities
, and we'd usually expect the server to create a city named "Springfield". If our backend is a simple express server, it might look something like this:
const express = require("express"),
app = express();
function handler(req, res) {
// create city
res.status(201)
.send("woo hoo!"); //
}
app.post("/api/cities", handler);
app.listen(8080);
Before executing handler
, express parses the request, extracts the request method and requested resource to determine what action to perform. It also parses the request headers and - depending on the Content-Type
- the request body, too. This looks somewhat similar to the following piece of pseudo-js:
// ...
const requestText = socket.readToEnd(),
request = parseRequest(requestText),
contentType = request.headers.get("Content-Type");
if (canParseRequestBody(contentType)) {
request.body = parseRequestBody(body, contentType);
}
const handler = registeredHandlers.findHandler(request.method, request.url),
response = new HttpResponse();;
if (handler) {
handler(request, response);
}
sendResponse(response);
// ...
The express example above also showed adding arbitary text to the response using res.send()
and setting the response's status using res.status()
. As you can see in the pseudo-js, after calling the request handling function, the underlaying framework will convert the response object to an HTTP response similar to the following and send it back to the client:
HTTP/1.1 201 Created
X-Powered-By: Express
Content-Length: 8
woo hoo!
In our client, we'd then evaluate the status code, headers and response text after parsing everything again.
Hello, this is Excel
Now, how do we make Excel answer our requests? Microsoft Office comes with a really cool toolbelt called macros, which people now hate because of ransomware. You could argue that macros are obsolete by now and coding them in Visual Basic isn't cool when you could use any modern programming language instead, but the concept behind them is pretty neat in fact.
VBA macros
Macros were originally conceived to save the user from repeating the same task over and over again. For this, users could record macros, which would then repeat what they did earlier. Internally the macro host would create a "script" of what's happening and interpret it later. These scripts happen to be generated in Visual Basic, or - to be more specific - Visual Basic for Applications (VBA). To interact with the application, macro hosts "inject" functionality into the VBA interpreter, like the Range
function in Excel, which can be used to access a collection of cells. As VBA is extremely easy to learn if you don't have a programming background, users quickly adopted and combinded injected functions and classes to Sub
s and Function
s to e.g. automatically generate cell values based on more complex calculations.
Crafting Excel files for version control
A major problem when developing macros is version control. Office files are usually zip containers, so adding them in binary would prevent any sane way of diffing. Extracting and re-zipping the project would be the way to go, if macros weren't binary encoded in a separate container using a special format called OLE. There are some OLE macro extractors like decalage2's oletools out there, but strangely I didn't find any library to create these containers the easy way.
Instead of reading the specification and creating our own library, we may try something else first: we can control Excel (or any other Office application) using .NET and let Excel do all the hard work for us. Using this approach, we need our macros' code in plain-text files, and an importer, which starts Excel and imports our macros. We'll be using PowerShell for this, which comes with full access to the .NET framework and because we don't have to compile these scripts.
In PowerShell, we can create an Excel instance, which we can then use to create workbooks and import our macros. You can take a look at the build script build.ps1 in the repository.
Escaping interop hell - using Windows Sockets in VBA
After we got our version control problems out of our way, we can get straight to the core of creating our server. As mentioned earlier, building a webserver requires handling TCP connections. The bad news is, VBA doesn't come with a TCP implementation by default and I can't really think of a reason why it should. But don't worry, Microsoft thought of somebody needing questionable features, so they baked C interop into VBA as well.
Everybody who's done C interop from a high-level language like C# knows the pain of AccessViolationException
when incorrectly marshalling parameters. In VBA it's basically the same, except that both the debugger and the IDE aren't really meant to develop interop-heavy applications, and thus debugging isn't as easy as you might be used to.
The "easiest" way of getting a TCP server running using only interop and no external libraries (like a C# library which implements the HTTP server already - that'd be too easy for us), is to use Windows Sockets (winsocks). If you haven't used winsocks yet, this is what it basically looks like in C++:
// we can skip all variable declarations as they're not that important here
// setup winsocks
WSAStartup(mode, &wsa);
// create a server socket, this is similar to bsd sockets
server = socket(AF_INET, SOCK_STREAM, 0);
// bind the server socket to an address and port, which it'll listen to later
addr.sin_port = htons(8080);
result = bind(server, &addr, sizeof(sockaddr_in));
// usually we'd check the result and handle errors, but that's not important here
// start listening on the server socket and allow queuing up `backlog` clients
result = listen(server, backlog);
// check result, see above
// get the first client socket in the backlog queue
client = accept(server, &clientAddr, sizeof(sockaddr));
// at this point, the connection is active and we can send/receive data
send(client, message, messageLength, flags);
// cleanup after we're done
closesocket(client);
closesocket(server);
WSACleanup();
The first good news is: we can translate this straight to VBA by importing all required methods in a module and simply call all of them in the right order to get a TCP server up and running.
Since we have a working TCP connection by now, we can continue our server development by parsing incoming HTTP requests. As shown above, requests consist of a protocol line, the headers and a request body. Parsing the protocol line is probably the easiest, we can split it into three parts: the request method, the resource and http version.
Before actually splitting the line, we should make sure we're dealing with an http request. To do so, we can use VBA's text comparison feature Like
, which checks if some text matches a very simple pattern, similar to regular expressions. By evaluating
' prevent comparison errors if clients send lower case requests
Dim upperLine As String
upperLine = UCase(line)
' this is somewhat similar to
' /.* HTTP\/1\.1/.test(upperLine)
' in js
If Not upperLine Like "* HTTP/1.1" Then
' we're concentrating on http 1, since version 2 is a bit more complex to implement
Err.Raise StatusCode.ErrorHttpRequestInvalidFormat
End If
' now we know the request is an http request and can continue parsing it
we can make sure to only process http requests and extract the request method and resource. Splitting the headers is a piece of cake as well, we just need to use Split(line, ":", 2)
on each header line, where 2
represents the maximum count of parts the split function should return, and we're set. To keep it simple, we're not going to parse the request body for now. Since we want our server to return a very simple response, we're just going to echo the request.
Similar to express, we'll handle requests using response objects. Our response class contains headers, a status and a body. Using this class, we can create a simple echo server by reading all incoming text, parsing the request and sending our response:
Dim server As TcpServer
Set server = New TcpServer
' listen for incoming connections on port 8080
server.BindTo 8080
' accept an incoming connection ...
Dim client As TcpClient
Set client = server.AcceptTcpClient()
' ... and receive the request text
Dim requestText As String
requestText = client.ReceiveString()
Dim request As HttpRequest
Set request = New HttpRequest
request.Parse requestText
Dim response As HttpResponse
Set response = New HttpResponse
' send "200 OK" and the body
response.StatusCode = 200
response.Body = "Called " & request.Url & " with this body:" & vbCrLf & vbCrLf & request.Body
Dim responseText As String
responseText = response.ToString()
' actually send the response back to the client
client.SendString responseText
' and do some cleanup
client.Dispose
server.Dispose
Webservers like nginx and apache can be configured to send the server version, so we're going to do the same with Excel. In our response class, we're using a ToString
method to convert our object to a string containing all response information. When examining our server with the axios node.js module, we'll receive a very satisfying response:
> const axios = require("axios");
> axios.post("http://localhost:8080", "it works").then(response => {
... console.log(response.status, response.statusText);
... console.log(response.headers);
... console.log(response.data);
... });
// outputs
200 'Nobody Needs This Anyway'
{ 'content-length': '39',
connection: 'close',
server: 'Microsoft Excel/16.0' }
Called /hello with this body:
it works
As you can see, we're also adding a header Connection: close
to our response, but most servers usually send Connection: keep-alive
. This is due to the http specification, which allows reusing the current socket for future requests, and webservers use this to gain some extra performance. Since our webserver isn't going to be as fast as any other server anyway, we might as well skip this and close the sockets, which is easier than keeping connections open, too.
It's blocking the gui, how do I stop it now?
We got an echo server working, great! But it only works for one request and we'll have to restart the macro everytime a client requests something, so let's put it in a loop and we're good to go.
Well, not exactly. If we execute a while (true) {Â }
style loop in a macro, we'll see a lot of white and a "Microsoft Excel (not responding)" kind of titlebar. This is due to how Excel handles macro execution. As you might guess, macros are executed on the main thread, so whatever we'll do, our server will prevent us from accessing Excel or even stopping the macro.
In our macro, however, we can do as much as we want, e.g. implement a kill-switch. Our kill-switch will be a file, which we'll create when the server starts and which the server will monitor. If the file gets deleted, the server stops, easy as that.
But we're not done here just yet. Calling accept
to get a client socket also blocks the macro execution until a client connects to our server. Searching for "winsocks accept timeout" takes us onto another C++ adventure: porting the FD_SET
and FD_ZERO
macros to VBA to use the select
method, which in turn gives us the count of available client sockets.
After we successfully ported these C++ macros to VBA, we can pass a timeval
object to select
and check if there's a client before blocking with accept
. Adding this to our server, we're finally able to do as many requests as we wish, plus we can stop the server using our kill-switch. Awesome!
Update 2017-10-09: As Michiel van der Blonk pointed out, calling DoEvents
from VBA will pause the macro execution until Excel finished processing its event queue. Adding this to our server loop allows us to access Excel while the server is running.
Creating a modular server architecture
If we want to get on the same level as express or any real-world webserver, we must to be able to configure http routes. VBA doesn't contain any form of inline functions, but it does contain basic inheritance. We can use this feature to create an abstract IWebController
base class, which we then subclass for our specific controllers.
Besides actually handling requests, each controller should also contain a method like MatchesUrl
, which the server can use to find the appropriate controller for a request. Encapsulating this in a WebControllerCollection
and adding such a collection to our server, we're now able to add any business logic to our server (like a FileSystemWebController
to serve static files).
Getting some REST
The title of this article promised a RESTful backend in Excel, and to this point we only got a basic http server. Since Excel is basically a set of tables, we might use this to our advance and read/modify the table's data using an IWebController
subclass.
CREATE TABLE
When we create a table in any real-world relational database, we're using something like this:
CREATE TABLE cities (
id INT PRIMARY KEY,
name VARCHAR(200),
fk_states VARCHAR(200)
);
In Excel, we can use worksheets as tables and the current workbook as our database. But how do we create columns? We can't change the column headings from "A", "B", "C", ... to anything else, so our best bet is to use the first line for our columns. Defining the primary key in a column needs to be easy as well. The easiest way to show that something is important, is to make it red or bold. Excel supports many different red tones, so marking our primary keys bold is probably the best idea:
Reading all entities from a table is easy as well, we just need to iterate over all rows until the primary key column is empty. Inserting works the same way, except we're looking for the first empty primary key cell to insert our record.
For any create or update REST action we'll need to parse the request body. Since most frontend frameworks use JSON exclusively, we'll need a new JsonParser
, which emits JsonObject
s and JsonArray
s. To keep it simple, we're using a hand-written recursive-descent top-down parser, which counts braces/brackets and then recursively calls the appropriate parse method.
Now that we have tables with primary keys and a JSON parser, we can go ahead and create REST endpoints in a WorkbookWebController
. To not iterate over all our tables on every request, we can add a route prefix like /workbook
(e.g. /workbook/cities
). In the ProcessRequest
method of our controller, we can then analyze which sheet was requested and which REST method we should perform. This yields a basic REST backend, which can return all entries in a table, and return, update or delete a single entity.
Let's call it WRM
Doing basic REST stuff is not good enough though: real web frameworks like ASP.NET map relationships of entities. If we'd have above schema in a database used by an ASP.NET app, the underlying persistence framework would resolve all foreign keys (e.g. fk_states
) and map these to their actual entities. We can create something similar in Excel, using not an object relationship mapper, but rather a worksheet relationship mapper (WRM).
In our WRM, we can read all table entries, but before returning the data to the client or inserting it to our tables, we're iterating over all columns and try to resolve each column starting with "fk_". Everytime we find such a column, we'll get the matching entity of the foreign table and use it instead of the raw value. Once everything is resolved, we might get something like this when accessing /workbook/cities
from the above schema:
[
{
"id": "1",
"city": "Seattle",
"states": {
"short_name": "WA",
"full_name": "Washington"
}
},
{
"id": "2",
"city": "Springfield",
"states": null
}
]
Putting it all together
We can now combine the contents of this article and build highly complex database schemas in Excel, which we can access using REST methods. As quickly noted before, our Excel server also supports serving static files, so it makes it an ideal platform to prototype our future web applications - at least if you're prototyping on Windows (maybe macOS support will come one day).
To showcase webxcel's ease-of-use, the repository contains a React todo app with an Excel backend in the example folder.
Edit on 2018-03-27 23:00
Fixed VBA syntax highlighting.
Edit on 2019-07-31 12:30
Added link to FastCGI post.
Posted on October 7, 2017
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
October 7, 2017