20
Handling Binary Data — Building a HTTP Server from scratch
On the last post of BTS: HTTP Server series.
I wrote a barebone HTTP server that can handle requests and respond appropriately.
I think I covered the basics, but that server is limited in what it can do.
It can only handle text-based Requests and Responses... That means no image or other media exchange.
And then, if the Request or the Response is larger than a KB, I'm out of luck. Again, not great for media...
This article is a transcript of a Youtube video I made.
Oh, hey there...
That's my challenge for today, refactor my server to handle arbitrarily sized Requests and avoid treating everything as
text...
If I want to be able to handle large requests, the first thing I can do is to read the stream in chunks, 1KB at a time
until there's nothing left to read.
Once I have all of my chunks, I can concatenate them together into one Typed Array. And voila, arbitrarly sized Request!
const concat = (...chunks) => {
const zs = new Uint8Array(chunks.reduce((z, ys) => z + ys.byteLength, 0));
chunks.reduce((i, xs) => zs.set(xs, i) || i + xs.byteLength, 0);
return zs;
};
const chunks = [];
let n;
do {
const xs = new Uint8Array(1024);
n = await r.read(xs);
chunks.push(xs.subarray(0, n));
} while (n === 1024);
const request = concat(...chunks);
The second challenge is to figure out how much of the data stream is the Request line and the Headers versus the body...
I want to avoid reading too far into the body, since it might be binary data.
I know that the body starts after the first empty line of the Request.
So I could technically, search for the first empty line and then I'll know that the rest is the body and only parse the first part.
So I wrote this function that will try to find a sequence within the array. First tries to find the first occurence of
a byte, and then I can just test the following bytes until I have a match.
In our case, I want to find a two CRLF sequences. So I try to find the first CR, then check if it is followed by LF, CR
and LF... And, I repeat this until I find the empty line.
export const findIndexOfSequence = (xs, ys) => {
let i = xs.indexOf(ys[0]);
let z = false;
while (i >= 0 && i < xs.byteLength) {
let j = 0;
while (j < ys.byteLength) {
if (xs[j + i] !== ys[j]) break;
j++;
}
if (j === ys.byteLength) {
z = true;
break;
}
i++;
}
return z ? i : null;
};
🐙 You will find the code for this post here: https://github.com/i-y-land/HTTP/tree/episode/03
The problem with this approach is that I have to traverse the whole request, and it might end up that the request doesn't
have a body, and therefore I wasted my time.
Instead, I will read the bytes one line at a time, finding the nearest CRLF and parse them in order.
On the first line, I will extract the method and the path.
Whenever I find an empty line, I will assume the is body is next and stop.
For the remaining lines, I will parse them as header.
// https://github.com/i-y-land/HTTP/blob/episode/03/library/utilities.js#L208
export const readLine = (xs) => xs.subarray(0, xs.indexOf(LF) + 1);
export const decodeRequest = (xs) => {
const headers = {};
let body, method, path;
const n = xs.byteLength;
let i = 0;
let seekedPassedHeader = false;
while (i < n) {
if (seekedPassedHeader) {
body = xs.subarray(i, n);
i = n;
continue;
}
const ys = readLine(xs.subarray(i, n));
if (i === 0) {
if (!findIndexOfSequence(ys, encode(" HTTP/"))) break;
[method, path] = decode(ys).split(" ");
} else if (
ys.byteLength === 2 &&
ys[0] === CR &&
ys[1] === LF &&
xs[i] === CR &&
xs[i + 1] === LF
) {
seekedPassedHeader = true;
} else if (ys.byteLength === 0) break;
else {
const [key, value] = decode(
ys.subarray(0, ys.indexOf(CR) || ys.indexOf(LF)),
).split(/(?<=^[A-Za-z-]+)\s*:\s*/);
headers[key.toLowerCase()] = value;
}
i += ys.byteLength;
}
return { body, headers, method, path };
};
On the other hand, the function to encode the Response is absurdly simpler, I can pretty much use the function I already made
and just encode the result. The biggest difference, is that I have to be aware that the body might not
be text and should be kept as a Typed Array. I can encode the header and then concat the result with the body.
// https://github.com/i-y-land/HTTP/blob/episode/03/library/utilities.js#L248
export const stringifyHeaders = (headers = {}) =>
Object.entries(headers)
.reduce(
(hs, [key, value]) => `${hs}\r\n${normalizeHeaderKey(key)}: ${value}`,
"",
);
export const encodeResponse = (response) =>
concat(
encode(
`HTTP/1.1 ${statusCodes[response.statusCode]}${
stringifyHeaders(response.headers)
}\r\n\r\n`,
),
response.body || new Uint8Array(0),
);
From there, I have enough to write a simple server using the serve
function I've implemented previously.
I can decode the request... then encode the response.
...
serve(
Deno.listen({ port }),
(xs) => {
const request = decodeRequest(xs);
if (request.method === "GET" && request.path === "/") {
return encodeResponse({ statusCode: 204 })
}
}
).catch((e) => console.error(e));
I could respond to every requests with a file. That is a good start to a static file server.
...
if (request.method === "GET" && request.path === "/") {
const file = Deno.readFile(`${Deno.cwd()}/image.png`); // read the file
return encodeResponse({
body: file,
headers: {
"content-length": file.byteLength,
"content-type": "image/png"
},
statusCode: 200
});
}
I can start my server and open a browser to visualize the image.
With a bit more effort, I can serve any file withing a given directory.
I would attempt to access the file and cross-reference the MIME type from a currated list using the extension.
If the system can't find the file, I will return 404 Not Found.
const sourcePath =
(await Deno.permissions.query({ name: "env", variable: "SOURCE_PATH" }))
.state === "granted" && Deno.env.get("SOURCE_PATH") ||
`${Deno.cwd()}/library/assets_test`;
...
if (request.method === "GET") {
try {
const file = await Deno.readFile(sourcePath + request.path); // read the file
return encodeResponse({
body: file,
headers: {
"content-length": file.byteLength,
["content-type"]: mimeTypes[
request.path.match(/(?<extension>\.[a-z0-9]+$)/)?.groups?.extension
.toLowerCase()
].join(",") || "plain/text",
},
statusCode: 200
});
} catch (e) {
if (e instanceof Deno.errors.NotFound) { // if the file is not found
return encodeResponse({
body: new Uint8Array(0),
headers: {
["Content-Length"]: 0,
},
statusCode: 404,
});
}
throw e;
}
}
With a broadly similar approach, I can receive any file.
const targetPath =
(await Deno.permissions.query({ name: "env", variable: "TARGET_PATH" }))
.state === "granted" && Deno.env.get("TARGET_PATH") ||
`${Deno.cwd()}/`;
...
if (request.method === "GET") { ... }
else if (request.method === "POST") {
await Deno.writeFile(targetPath + request.path, request.body); // write the file
return encodeResponse({ statusCode: 204 });
}
Now, you can guess if you look at the position of your scrollbar that things can't be that simple...
I see two problems with my current approach.
I have to load whole files into memory before I can offload it to the File System which that can become a bottle neck at
scale.
Another surprising issue is with file uploads...
When uploading a file, some clients, for example curl
will make the request in two steps... The first request is
testing the terrain stating that it wants to upload a file of a certain type and length and requires that the server
replies with 100 continue
before sending the file.
Because of this behaviour I need to retain access to the connection, the writable resource.
So I think I will have to refactor the serve
function from accepting a function that takes a Typed Array as an
argument, to a function that takes the connection.
This could also be positive change that would facilitate implementing powerful middleware later on...
export const serve = async (listener, f) => {
for await (const connection of listener) {
await f(connection);
}
};
There's two ways that my server can handle file uploads.
One possibility is that the client tries to to post the file directly,
I have the option to read the header and refuse the request if it's too large. The other possibility is that the
client expects me to reply first.
In both case I will read the first chunk and then start creating the file with the data processed. Then I want to
to read one chunk at a time from the connection and systematically write them to the file. This way, I never hold
more than 1KB in memory at a time... I do this until I can't read a whole 1KB, this tells me that the file has been
completely copied over.
export const copy = async (r, w) => {
const xs = new Uint8Array(1024);
let n;
let i = 0;
do {
n = await r.read(xs);
await w.write(xs.subarray(0, n));
i += n;
} while (n === 1024);
return i;
};
...
let xs = new Uint8Array(1024);
const n = await Deno.read(r.rid, xs);
const request = xs.subarray(0, n);
const { fileName } = request.path.match(
/.*?\/(?<fileName>(?:[^%]|%[0-9A-Fa-f]{2})+\.[A-Za-z0-9]+?)$/,
)?.groups || {};
...
const file = await Deno.open(`${targetPath}/${fileName}`, {
create: true,
write: true,
});
if (request.headers.expect === "100-continue") {
// write the `100 Continue` response
await Deno.write(connection.rid, encodeResponse({ statusCode: 100 }));
const ys = new Uint8Array(1024);
const n = await Deno.read(connection.rid, ys); // read the follow-up
xs = ys.subarray(0, n);
}
const i = findIndexOfSequence(xs, CRLF); // find the beginning of the body
if (i > 0) {
await Deno.write(file.rid, xs.subarray(i + 4)); // write possible file chunk
if (xs.byteLength === 1024) {
await copy(connection, file); // copy subsequent chunks
}
}
await connection.write(
encodeResponse({ statusCode: 204 }), // terminate the exchange
);
...
From there, I can rework the part that responds with a file.
Similarly to the two-step request for receiving a file, a client may opt to request the headers for a given file
with the HEAD
method.
Because I want to support this feature, I can first get information from the requested file, then I can start writing
the headers and only if the request's method is GET
-- not HEAD
-- I will copy the file to the connection.
...
try {
const { size } = await Deno.stat(`${sourcePath}/${fileName}`);
await connection.write(
encodeResponse({
headers: {
["Content-Type"]: mimeTypes[
fileName.match(/(?<extension>\.[a-z0-9]+$)/)?.groups?.extension
.toLowerCase()
].join(",") || "plain/text",
["Content-Length"]: size,
},
statusCode: 200,
}),
);
if (request.method === "GET") {
const file = await Deno.open(`${sourcePath}/${fileName}`);
await copy(file, connection);
}
} catch (e) {
if (e instanceof Deno.errors.NotFound) {
Deno.write(
connection.rid,
encodeResponse({
headers: {
["Content-Length"]: 0,
},
statusCode: 404,
}),
);
}
throw e;
}
...
Wow. At this point I have to be either very confident with my programming skills or sadistic...
I need to implement a slew of integrations tests before going any further.
I created four static files for this purpose, a short text file, less than a KB, a longer text file, an image and
music...
For that purpose, I wrote a higher-order-function that will initialize the server before calling the test function.
// https://github.com/i-y-land/HTTP/blob/episode/03/library/integration_test.js#L6
const withServer = (port, f) =>
async () => {
const p = await Deno.run({ // initialize the server
cmd: [
"deno",
"run",
"--allow-all",
`${Deno.cwd()}/cli.js`,
String(port),
],
env: { LOG_LEVEL: "ERROR", "NO_COLOR": "1" },
stdout: "null",
});
await new Promise((resolve) => setTimeout(resolve, 1000)); // wait to be sure
try {
await f(p); // call the test function passing the process
} finally {
Deno.close(p.rid);
}
};
With that, I generate a bunch of tests to download and upload files; this ensures that my code is working as expected.
// https://github.com/i-y-land/HTTP/blob/episode/03/library/integration_test.js#L58
[...]
.forEach(
({ headers = {}, method = "GET", path, title, f }) => {
Deno.test(
`Integration: ${title}`,
withServer(
8080,
async () => {
const response = await fetch(`http://localhost:8080${path}`, {
headers,
method,
});
await f(response);
},
),
);
},
);
When I got to that point, I realized that my serve function was starting to be very... long.
I knew I needed to refactor it into two functions receiveStaticFile
and sendStaticFile
.
But, because I need to be able to check the Request line to route to the right function, and I can only read the request
once...
I knew that I was in trouble.
I need something that can keep part of the data in memory while retaining access to the raw connection...
...
if (method === "POST") {
return receiveStaticFile(?, { targetPath });
} else if (method === "GET" || method === "HEAD") {
return sendStaticFile(?, { sourcePath });
}
...
I could have decoded the request and shove the connection in there and call it a day...
But it didn't feel right aaaand I guess I love making my life harder.
const request = decodeRequest(connection);
request.connection = connection;
...
if (method === "POST") {
return receiveStaticFile(request, { targetPath });
} else if (method === "GET" || method === "HEAD") {
return sendStaticFile(request, { sourcePath });
}
...
The solution I came up with was to write a buffer. It would hold in memory only a KB at a time, shifting the bytes
each time I read a new chunk. The advantage of that is I can move the cursor back to the beginning of the buffer
and read-back parts that I need.
Best of all, the buffer has the same methods as the connection; so the two could be used interchangeably.
I won't go into the details because it's a bit dry, but if you want to checkout the code, it's currently on Github.
// https://github.com/i-y-land/HTTP/blob/episode/03/library/utilities.js#L11
export const factorizeBuffer = (r, mk = 1024, ml = 1024) => { ... }
With this new toy I can read a chunk from the connection, route the request, move the cursor back to the beginning and
pass the buffer to the handler function like nothing happened.
The peek
function specifically has a similar signature to read
, the difference is that it will move the cursor
back, read a chunk from the buffer in memory and then finally move the cursor back again.
serve(
Deno.listen({ port }),
async (connection) => {
const r = factorizeBuffer(connection);
const xs = new Uint8Array(1024);
const reader = r.getReader();
await reader.peek(xs);
const [method] = decode(readLine(xs)).split(" ");
if (method !== "GET" && method !== "POST" && method !== "HEAD") {
return connection.write(
encodeResponse({ statusCode: 400 }),
);
}
if (method === "POST") {
return receiveStaticFile(r, { targetPath });
} else {
return sendStaticFile(r, { sourcePath });
}
}
)
To finish this, like a boss, I finalize the receiveStaticFile
(https://github.com/i-y-land/HTTP/blob/episode/03/library/server.js#L15) and sendStaticFile
(https://github.com/i-y-land/HTTP/blob/episode/03/library/server.js#L71) functions, taking care of all
the edge cases.
Finally, I run all the integration tests to confirm that I did a good job. And uuugh. Sleeeep.
This one turned out to be a lot more full of surprise than I was prepared for.
When I realized that some client send file in two-steps, it really threw a wrench to my plans...
But it turned out to an amazing learning opportunity.
I really hope that you are learning as much as I am.
On the bright side, this forced me to put together all the tools that I know I will need for the next post.
Next, I want to look into streaming in more details and build some middlewares, starting with a logger.
From there, I am sure that I can tackle building a nice little router which will wrap this up pretty nicely.
All of the code is available on Github, if you have a question do no hesitate to ask...
Oh speaking of that, I launched a Discord server, if you want to join.
🐙 You will find the code for this episode here: https://github.com/i-y-land/HTTP/tree/episode/03
💬 You can join the I-Y community on Discord: https://discord.gg/eQfhqybmSc
At any rate, if this article was useful to you, hit the like button, leave a comment to let me know or best of all,
follow if you haven't already!
Ok bye now...
20