19
Let's develop a QR Code Generator, part I: basic concepts
Recently I was developing a little puzzle web game, and I thought that it would be cool if people could share the puzzles using something commonly shareable like a QR Code. After all, Chromium-based browsers support the Barcode Detection API since v83 came out in May 2020, so if it's there why not using it?
Unfortunately, it's an API to just read a QR Code (which, admittedly, it's the hardest part), not to generate it, so we have to do it ourselves. How hard could it be?!
Oh boy.
Let's dive into a series of fairly advanced mathematical concepts and a long sequence of rules ranging from "almost reasonable" to seemingly "totally ludicrous", only to create a bunch of monochromatic pixels.
For the sake of the articles in this series, we'll concentrate in just QR code generation, leaving detection aside. We'll also start studying the simpler case of smaller codes with 8-bit characters.
QR codes contain data, that's fair to say. The type of data can be decided, but of course it determines the maximum amount of information that can be stored:
It shouldn't surprise that kanji is one of the main symbol sets, since QR codes have been developed by Denso Wave, a Japanese automation company.
There actually are other encoding modes in more recent versions, but as mentioned before we'll focus on 8-bit bytes for now. And in the end, a QR code is a series of bits - so if you want to encode your information as you wish, you can.
Moreover, QR codes can also switch to a different encoding mode in the middle of its data, but we won't consider this case for now.
QR codes are always square, but their sizes vary. The size is determined by the unusual term "version", so that version 1 is 21×21 pixels large, while version 40 (the largest) is 177×177 pixels. A QR code 1 version larger is 4 pixels wider and taller, so the size is (17 + version * 4) pixels.
Also, we shouldn't call them pixels, but rather "modules" (again, unusual but maybe something has been lost in translation from Japanese).
Since larger QR codes are more difficult to decode (and computationally more expensive), the goal is to use the smallest possible "version" for the amount of data we want to store.
Larger QR codes split their data in multiple blocks (up to 81).
Every QR code contain error correction "modules" - and no, we can't remove them to maximize the available space. But we can choose among 4 levels of error correction:
Level | Letter | Data recovery |
---|---|---|
Low | L | ~7% |
Medium | M | ~15% |
Quartile | Q | ~25% |
High | H | ~30% |
A higher error correction can be abused to create QR codes partially covered by logos and pictures, but that can be still read thanks to error correction.

Many of us can recognize what's possibly a QR code with a glance, and it's because of some common characteristics:
About the last one, the point of being easy to recognize is exactly why they have been designed like that: let's remember that QR (= Quick Response) codes have been developed to be used by industrial automated machines.
These patterns are:




Moreover, in larger QR codes (from version 7 and up) a couple of areas are reserved for format data.
Given the version, the encoding mode and the error correction level, the capacity of a QR code is determined. The available space that's not occupied by fixed patterns or reserved areas is divided in groups of 8 modules called "codewords": think of them as classic 8-bit bytes.
Therefore, the total available codewords is fixed for every version: 26 for version 1, 44 for version 2 and so on, up to 3706 for version 40.
For each version, the codewords reserved for error correction, is also determined, and can be found in tables like this one.
Without much further ado, let's begin to build some small QR code, with ISO-8859-1 byte encoding!
Yes, QR codes use ISO-8859-1 (also known as Latin-1) to encode their byte strings. Today UTF-8 is more common, but a while ago it wasn't.
The basic issue here is that, while UTF-8 can encompass millions of characters (or "code points"), Latin-1 has only 255 symbols. That's it. No emojis, no other alphabets. If you want to check if a string is valid for Latin-1, the check is simple:
const LATIN1_RE = /^[\x00-\xff]*$/;
function isLatin1(string) {
return LATIN1_RE.test(string);
}
If some characters are outside ISO-8859-1, well... you either discard them, or use ECI mode. Also, some readers automatically recognize if UTF-8 is used instead, but it might not be a reliable choice for public QR codes.
Stay in touch for the next part: encoding the data!
19