Handling Binary Data ā Building a HTTP Server from scratch
Sebastien Filion
Posted on July 14, 2021
On the last post of BTS: HTTP Server series.
I wrote a barebone HTTP server that can handle requests and respond appropriately.
I think I covered the basics, but that server is limited in what it can do.
It can only handle text-based Requests and Responses... That means no image or other media exchange.
And then, if the Request or the Response is larger than a KB, I'm out of luck. Again, not great for media...
This article is a transcript of a Youtube video I made.
Oh, hey there...
That's my challenge for today, refactor my server to handle arbitrarily sized Requests and avoid treating everything as
text...
If I want to be able to handle large requests, the first thing I can do is to read the stream in chunks, 1KB at a time
until there's nothing left to read.
Once I have all of my chunks, I can concatenate them together into one Typed Array. And voila, arbitrarly sized Request!
const concat = (...chunks) => {
const zs = new Uint8Array(chunks.reduce((z, ys) => z + ys.byteLength, 0));
chunks.reduce((i, xs) => zs.set(xs, i) || i + xs.byteLength, 0);
return zs;
};
const chunks = [];
let n;
do {
const xs = new Uint8Array(1024);
n = await r.read(xs);
chunks.push(xs.subarray(0, n));
} while (n === 1024);
const request = concat(...chunks);
The second challenge is to figure out how much of the data stream is the Request line and the Headers versus the body...
I want to avoid reading too far into the body, since it might be binary data.
I know that the body starts after the first empty line of the Request.
So I could technically, search for the first empty line and then I'll know that the rest is the body and only parse the first part.
So I wrote this function that will try to find a sequence within the array. First tries to find the first occurence of
a byte, and then I can just test the following bytes until I have a match.
In our case, I want to find a two CRLF sequences. So I try to find the first CR, then check if it is followed by LF, CR
and LF... And, I repeat this until I find the empty line.
export const findIndexOfSequence = (xs, ys) => {
let i = xs.indexOf(ys[0]);
let z = false;
while (i >= 0 && i < xs.byteLength) {
let j = 0;
while (j < ys.byteLength) {
if (xs[j + i] !== ys[j]) break;
j++;
}
if (j === ys.byteLength) {
z = true;
break;
}
i++;
}
return z ? i : null;
};
š You will find the code for this post here: https://github.com/i-y-land/HTTP/tree/episode/03
The problem with this approach is that I have to traverse the whole request, and it might end up that the request doesn't
have a body, and therefore I wasted my time.
Instead, I will read the bytes one line at a time, finding the nearest CRLF and parse them in order.
On the first line, I will extract the method and the path.
Whenever I find an empty line, I will assume the is body is next and stop.
For the remaining lines, I will parse them as header.
// https://github.com/i-y-land/HTTP/blob/episode/03/library/utilities.js#L208
export const readLine = (xs) => xs.subarray(0, xs.indexOf(LF) + 1);
export const decodeRequest = (xs) => {
const headers = {};
let body, method, path;
const n = xs.byteLength;
let i = 0;
let seekedPassedHeader = false;
while (i < n) {
if (seekedPassedHeader) {
body = xs.subarray(i, n);
i = n;
continue;
}
const ys = readLine(xs.subarray(i, n));
if (i === 0) {
if (!findIndexOfSequence(ys, encode(" HTTP/"))) break;
[method, path] = decode(ys).split(" ");
} else if (
ys.byteLength === 2 &&
ys[0] === CR &&
ys[1] === LF &&
xs[i] === CR &&
xs[i + 1] === LF
) {
seekedPassedHeader = true;
} else if (ys.byteLength === 0) break;
else {
const [key, value] = decode(
ys.subarray(0, ys.indexOf(CR) || ys.indexOf(LF)),
).split(/(?<=^[A-Za-z-]+)\s*:\s*/);
headers[key.toLowerCase()] = value;
}
i += ys.byteLength;
}
return { body, headers, method, path };
};
On the other hand, the function to encode the Response is absurdly simpler, I can pretty much use the function I already made
and just encode the result. The biggest difference, is that I have to be aware that the body might not
be text and should be kept as a Typed Array. I can encode the header and then concat the result with the body.
// https://github.com/i-y-land/HTTP/blob/episode/03/library/utilities.js#L248
export const stringifyHeaders = (headers = {}) =>
Object.entries(headers)
.reduce(
(hs, [key, value]) => `${hs}\r\n${normalizeHeaderKey(key)}: ${value}`,
"",
);
export const encodeResponse = (response) =>
concat(
encode(
`HTTP/1.1 ${statusCodes[response.statusCode]}${
stringifyHeaders(response.headers)
}\r\n\r\n`,
),
response.body || new Uint8Array(0),
);
From there, I have enough to write a simple server using the serve
function I've implemented previously.
I can decode the request... then encode the response.
...
serve(
Deno.listen({ port }),
(xs) => {
const request = decodeRequest(xs);
if (request.method === "GET" && request.path === "/") {
return encodeResponse({ statusCode: 204 })
}
}
).catch((e) => console.error(e));
I could respond to every requests with a file. That is a good start to a static file server.
...
if (request.method === "GET" && request.path === "/") {
const file = Deno.readFile(`${Deno.cwd()}/image.png`); // read the file
return encodeResponse({
body: file,
headers: {
"content-length": file.byteLength,
"content-type": "image/png"
},
statusCode: 200
});
}
I can start my server and open a browser to visualize the image.
With a bit more effort, I can serve any file withing a given directory.
I would attempt to access the file and cross-reference the MIME type from a currated list using the extension.
If the system can't find the file, I will return 404 Not Found.
const sourcePath =
(await Deno.permissions.query({ name: "env", variable: "SOURCE_PATH" }))
.state === "granted" && Deno.env.get("SOURCE_PATH") ||
`${Deno.cwd()}/library/assets_test`;
...
if (request.method === "GET") {
try {
const file = await Deno.readFile(sourcePath + request.path); // read the file
return encodeResponse({
body: file,
headers: {
"content-length": file.byteLength,
["content-type"]: mimeTypes[
request.path.match(/(?<extension>\.[a-z0-9]+$)/)?.groups?.extension
.toLowerCase()
].join(",") || "plain/text",
},
statusCode: 200
});
} catch (e) {
if (e instanceof Deno.errors.NotFound) { // if the file is not found
return encodeResponse({
body: new Uint8Array(0),
headers: {
["Content-Length"]: 0,
},
statusCode: 404,
});
}
throw e;
}
}
With a broadly similar approach, I can receive any file.
const targetPath =
(await Deno.permissions.query({ name: "env", variable: "TARGET_PATH" }))
.state === "granted" && Deno.env.get("TARGET_PATH") ||
`${Deno.cwd()}/`;
...
if (request.method === "GET") { ... }
else if (request.method === "POST") {
await Deno.writeFile(targetPath + request.path, request.body); // write the file
return encodeResponse({ statusCode: 204 });
}
Now, you can guess if you look at the position of your scrollbar that things can't be that simple...
I see two problems with my current approach.
I have to load whole files into memory before I can offload it to the File System which that can become a bottle neck at
scale.
Another surprising issue is with file uploads...
When uploading a file, some clients, for example curl
will make the request in two steps... The first request is
testing the terrain stating that it wants to upload a file of a certain type and length and requires that the server
replies with 100 continue
before sending the file.
Because of this behaviour I need to retain access to the connection, the writable resource.
So I think I will have to refactor the serve
function from accepting a function that takes a Typed Array as an
argument, to a function that takes the connection.
This could also be positive change that would facilitate implementing powerful middleware later on...
export const serve = async (listener, f) => {
for await (const connection of listener) {
await f(connection);
}
};
There's two ways that my server can handle file uploads.
One possibility is that the client tries to to post the file directly,
I have the option to read the header and refuse the request if it's too large. The other possibility is that the
client expects me to reply first.
In both case I will read the first chunk and then start creating the file with the data processed. Then I want to
to read one chunk at a time from the connection and systematically write them to the file. This way, I never hold
more than 1KB in memory at a time... I do this until I can't read a whole 1KB, this tells me that the file has been
completely copied over.
export const copy = async (r, w) => {
const xs = new Uint8Array(1024);
let n;
let i = 0;
do {
n = await r.read(xs);
await w.write(xs.subarray(0, n));
i += n;
} while (n === 1024);
return i;
};
...
let xs = new Uint8Array(1024);
const n = await Deno.read(r.rid, xs);
const request = xs.subarray(0, n);
const { fileName } = request.path.match(
/.*?\/(?<fileName>(?:[^%]|%[0-9A-Fa-f]{2})+\.[A-Za-z0-9]+?)$/,
)?.groups || {};
...
const file = await Deno.open(`${targetPath}/${fileName}`, {
create: true,
write: true,
});
if (request.headers.expect === "100-continue") {
// write the `100 Continue` response
await Deno.write(connection.rid, encodeResponse({ statusCode: 100 }));
const ys = new Uint8Array(1024);
const n = await Deno.read(connection.rid, ys); // read the follow-up
xs = ys.subarray(0, n);
}
const i = findIndexOfSequence(xs, CRLF); // find the beginning of the body
if (i > 0) {
await Deno.write(file.rid, xs.subarray(i + 4)); // write possible file chunk
if (xs.byteLength === 1024) {
await copy(connection, file); // copy subsequent chunks
}
}
await connection.write(
encodeResponse({ statusCode: 204 }), // terminate the exchange
);
...
From there, I can rework the part that responds with a file.
Similarly to the two-step request for receiving a file, a client may opt to request the headers for a given file
with the HEAD
method.
Because I want to support this feature, I can first get information from the requested file, then I can start writing
the headers and only if the request's method is GET
-- not HEAD
-- I will copy the file to the connection.
...
try {
const { size } = await Deno.stat(`${sourcePath}/${fileName}`);
await connection.write(
encodeResponse({
headers: {
["Content-Type"]: mimeTypes[
fileName.match(/(?<extension>\.[a-z0-9]+$)/)?.groups?.extension
.toLowerCase()
].join(",") || "plain/text",
["Content-Length"]: size,
},
statusCode: 200,
}),
);
if (request.method === "GET") {
const file = await Deno.open(`${sourcePath}/${fileName}`);
await copy(file, connection);
}
} catch (e) {
if (e instanceof Deno.errors.NotFound) {
Deno.write(
connection.rid,
encodeResponse({
headers: {
["Content-Length"]: 0,
},
statusCode: 404,
}),
);
}
throw e;
}
...
Wow. At this point I have to be either very confident with my programming skills or sadistic...
I need to implement a slew of integrations tests before going any further.
I created four static files for this purpose, a short text file, less than a KB, a longer text file, an image and
music...
For that purpose, I wrote a higher-order-function that will initialize the server before calling the test function.
// https://github.com/i-y-land/HTTP/blob/episode/03/library/integration_test.js#L6
const withServer = (port, f) =>
async () => {
const p = await Deno.run({ // initialize the server
cmd: [
"deno",
"run",
"--allow-all",
`${Deno.cwd()}/cli.js`,
String(port),
],
env: { LOG_LEVEL: "ERROR", "NO_COLOR": "1" },
stdout: "null",
});
await new Promise((resolve) => setTimeout(resolve, 1000)); // wait to be sure
try {
await f(p); // call the test function passing the process
} finally {
Deno.close(p.rid);
}
};
With that, I generate a bunch of tests to download and upload files; this ensures that my code is working as expected.
// https://github.com/i-y-land/HTTP/blob/episode/03/library/integration_test.js#L58
[...]
.forEach(
({ headers = {}, method = "GET", path, title, f }) => {
Deno.test(
`Integration: ${title}`,
withServer(
8080,
async () => {
const response = await fetch(`http://localhost:8080${path}`, {
headers,
method,
});
await f(response);
},
),
);
},
);
When I got to that point, I realized that my serve function was starting to be very... long.
I knew I needed to refactor it into two functions receiveStaticFile
and sendStaticFile
.
But, because I need to be able to check the Request line to route to the right function, and I can only read the request
once...
I knew that I was in trouble.
I need something that can keep part of the data in memory while retaining access to the raw connection...
...
if (method === "POST") {
return receiveStaticFile(?, { targetPath });
} else if (method === "GET" || method === "HEAD") {
return sendStaticFile(?, { sourcePath });
}
...
I could have decoded the request and shove the connection in there and call it a day...
But it didn't feel right aaaand I guess I love making my life harder.
const request = decodeRequest(connection);
request.connection = connection;
...
if (method === "POST") {
return receiveStaticFile(request, { targetPath });
} else if (method === "GET" || method === "HEAD") {
return sendStaticFile(request, { sourcePath });
}
...
The solution I came up with was to write a buffer. It would hold in memory only a KB at a time, shifting the bytes
each time I read a new chunk. The advantage of that is I can move the cursor back to the beginning of the buffer
and read-back parts that I need.
Best of all, the buffer has the same methods as the connection; so the two could be used interchangeably.
I won't go into the details because it's a bit dry, but if you want to checkout the code, it's currently on Github.
// https://github.com/i-y-land/HTTP/blob/episode/03/library/utilities.js#L11
export const factorizeBuffer = (r, mk = 1024, ml = 1024) => { ... }
With this new toy I can read a chunk from the connection, route the request, move the cursor back to the beginning and
pass the buffer to the handler function like nothing happened.
The peek
function specifically has a similar signature to read
, the difference is that it will move the cursor
back, read a chunk from the buffer in memory and then finally move the cursor back again.
serve(
Deno.listen({ port }),
async (connection) => {
const r = factorizeBuffer(connection);
const xs = new Uint8Array(1024);
const reader = r.getReader();
await reader.peek(xs);
const [method] = decode(readLine(xs)).split(" ");
if (method !== "GET" && method !== "POST" && method !== "HEAD") {
return connection.write(
encodeResponse({ statusCode: 400 }),
);
}
if (method === "POST") {
return receiveStaticFile(r, { targetPath });
} else {
return sendStaticFile(r, { sourcePath });
}
}
)
To finish this, like a boss, I finalize the receiveStaticFile
(https://github.com/i-y-land/HTTP/blob/episode/03/library/server.js#L15) and sendStaticFile
(https://github.com/i-y-land/HTTP/blob/episode/03/library/server.js#L71) functions, taking care of all
the edge cases.
Finally, I run all the integration tests to confirm that I did a good job. And uuugh. Sleeeep.
This one turned out to be a lot more full of surprise than I was prepared for.
When I realized that some client send file in two-steps, it really threw a wrench to my plans...
But it turned out to an amazing learning opportunity.
I really hope that you are learning as much as I am.
On the bright side, this forced me to put together all the tools that I know I will need for the next post.
Next, I want to look into streaming in more details and build some middlewares, starting with a logger.
From there, I am sure that I can tackle building a nice little router which will wrap this up pretty nicely.
All of the code is available on Github, if you have a question do no hesitate to ask...
Oh speaking of that, I launched a Discord server, if you want to join.
š You will find the code for this episode here: https://github.com/i-y-land/HTTP/tree/episode/03
š¬ You can join the I-Y community on Discord: https://discord.gg/eQfhqybmSc
At any rate, if this article was useful to you, hit the like button, leave a comment to let me know or best of all,
follow if you haven't already!
Ok bye now...
Posted on July 14, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.