100 Languages Speedrun: Episode 39: SQLite

SQLite is likely the world's most popular database. It can be embedded in apps, and so many apps use it.

For example if you use Google Chrome, then you can open your history with sqlite3 and see which URL you visited most often:

$  sqlite3 ~/'Library/Application Support/Google/Chrome/Default/History'
SQLite version 3.32.3 2020-06-18 14:16:19
Enter ".help" for usage hints.
sqlite> .headers on
sqlite> select url, title from urls order by visit_count desc limit 1;
url|title
https://twitter.com/|Home / Twitter

At least if Google Chrome is not currently running. If it's running SQLite will tell you it's locked instead, as by design it only lets one application use each database file at a time, so copy that file elsewhere to view or modify it.

Anyway, this episode is not about using SQLite for its intended purpose. We're going to have some fun.

SQL

Contrary to very common misconception, SQL is not a language, no more than "Lisp" or "Shell" are a language. Every database has its own "SQL" and they're vastly incompatible for even the simplest tasks. Over years there's been some standards, and some attempts at bringing them closer together, but it's really all failed, and every database has its own "SQL" language.

If you try to run SQL for SQLite on PostgreSQL or MySQL or whichever other system, you'd have about as much luck as trying to run Clojure Lisp code in Racket Lisp, or PowerShell shell code on ZSH shell.

Hello, World!

SQLite wasn't really designed to run SQL scripts without database from #!, so we need to write a small header to trick it into our special mode. Here's Hello, World!:

#!/bin/bash
tail -n +5 "$0" | sqlite3
exit $?

select "Hello, World!";

Which we can run with command line as expected:

$ ./hello.sql
Hello, World!

The "$0" in shell means means current file name - and we need to put it in "" in case path contains spaces, as that breaks shell scripts. tail -n +5 means print everything from a file from line 5 onwards. | sqlite3 means start SQLite and pipe that stuff into its input.

There are of course many other ways to run it. The more reasonable way would be to simply save that SQL to a fire and do sqlite3 <file.sql:

$ cat hello2.sql
select "Hello, World!";
$ sqlite3 <hello2.sql
Hello, World!

Or echo it from within shell - this is not really recommended for anything nontrivial, as we need to deal with shell quoting and evaluation issues:

$ echo 'select "Hello, World!";' | sqlite3
Hello, World!
$ sqlite3 <<<'select "Hello, World!";'
Hello, World!

Going on, I won't be listing the tail header in the examples, just the SQL part.

Loops

The first problem we run into is looping. We have no database, and we'd like to print some integers, and that's surprisingly hard in any shared SQL.

Some databases save us here, for example this works in PostgreSQL returning all numbers from 1 to 10:

select * from generate_series(1, 10);

In SQLite we can list all the numbers, but that's of course not reasonable beyond just a few:

select 1 union
select 2 union
select 3 union
select 4 union
select 5 union
select 6 union
select 7 union
select 8 union
select 9 union
select 10;

The best way to do so seems to be "recursive Common Table Expressions" (recursive CTEs). SQL doesn't allow recursion in general, but it's possible to do so in some limited cases.

with recursive generate_series(value) as (
  select 1
  union all select value+1
  from generate_series
  where value + 1 <= 10
)
select value from generate_series;

FizzBuzz

Once we have the hard problem of looping solved, FizzBuzz itself is very easy:

#!/bin/bash
tail -n +5 "$0" | sqlite3
exit $?

with recursive generate_series(value) as (
  select 1
  union all select value+1
  from generate_series
  where value + 1 <= 100
)
select
  case
    when value % 15 = 0 then 'FizzBuzz'
    when value % 5 = 0 then 'Buzz'
    when value % 3 = 0 then 'Fizz'
    else value
  end
from generate_series;

Fibonacci

We can use a recursive CTE again, with all the extra variables we want. We just only select the ones we care about afterwards (a), and ignore all the rest (b, i):

#!/bin/bash
tail -n +5 "$0" | sqlite3
exit $?

with recursive fib(a, b, i) as (
  select 1, 1, 1
  union all
  select b, a + b, i + 1
  from fib
  where i + 1 <= 100
)
select a from fib;

SQLite uses floats so at some point it will lose precision:

$ ./fib.sql
1
1
2
3
5
8
13
21
34
55
..
5.16807088548583e+19
8.36211434898484e+19
1.35301852344707e+20
2.18922995834555e+20
3.54224848179262e+20

CSV

SQL can import and export to CSV, but let's pretend it doesn't, and we need to process some CSV data.

I'll actually use temporary tables for this one, so I can do this step by step - but it all works as a monster expression.

I asked GitHub Copilot to generate some data of people and their favorite programming languages:

create table csv(data);
insert into csv values('first name,last name,favorite language
Alice,Smith,Ruby
Bob,Smith,Python
Charlie,Brown,JavaScript
Daniel,Clark,PHP
Ed,Jones,C
Frank,Smith,HTML
Gary,Johnson,CSS
Heather,Williams,JavaScript
Ivan,Smith,Ruby
Jack,Jones,JavaScript
John,Brown,PHP
Karen,Clark,JavaScript
Larry,Jones,HTML
Mike,Williams,CSS
Nathan,Smith,Ruby
Oscar,Brown,JavaScript
Peter,Clark,PHP
Quinn,Jones,JavaScript
Ralph,Smith,HTML
Sally,Williams,CSS
Tom,Smith,Ruby
Will,Jones,JavaScript
Xavier,Brown,PHP
Yvonne,Clark,JavaScript
Zachary,Jones,HTML
');

Let's say we want to get a result where each line is programming language, and then list of everyone who likes it.

First, let's split it into lines:

create table lines(lineno, line);
insert into lines
with recursive split(lineno, line, str) as (
  select 0, null, data from csv
  union all
  select
    lineno + 1,
    substr(str, 0, instr(str, char(10))),
    substr(str, instr(str, char(10))+1)
    from split where str != ''
) select lineno, line from split where line is not null;

This code recursively turns "A\nB\nC\n" into "A" and "B\nC\n" - until result is empty. Then it throws out those strings.

SQLite doesn't have any way to escape \n, so we need to do char(10). It's also important that this input ends with \n, otherwise the expression will loop forever. We could add some more checks to deal with it, but let's just deal with the happy path.

Data looks like this (first line is SQL header which you can get with .headers on, SQL output is separated by | by default):

lineno|line
1|first name,last name,favorite language
2|Alice,Smith,Ruby
3|Bob,Smith,Python
4|Charlie,Brown,JavaScript
5|Daniel,Clark,PHP
...

Now let's do this again for each line. As lines don't end with , we need to (line||',') for our loop to work (|| is string concatenation, not logical or):

create table cells(lineno, colno, cell);
insert into cells
with recursive split(lineno, colno, cell, str) as (
  select lineno, 0, null, (line||',') from lines
  union all
  select
    lineno,
    colno + 1,
    substr(str, 0, instr(str, ',')),
    substr(str, instr(str, ',')+1)
    from split where str != ''
) select lineno, colno, cell from split where cell is not null;

Data looks like this:

lineno|colno|cell
1|1|first name
1|2|last name
1|3|favorite language
2|1|Alice
2|2|Smith
2|3|Ruby
3|1|Bob
3|2|Smith
...

Now we could work with those column numbers, but I'd much prefer to transform it so we can see keys and values.

create table cellvals(lineno, k, v);
insert into cellvals
select c.lineno, h.cell, c.cell from cells h
inner join cells c
on c.colno = h.colno and h.lineno = 1 and c.lineno != 1;

Data looks like this now:

lineno|k|v
2|first name|Alice
2|last name|Smith
2|favorite language|Ruby
3|first name|Bob
3|last name|Smith
3|favorite language|Python
...

Now let's turn this into something SQL friendly:

create table preferences(full_name, language);
insert into preferences
select fn.v||' '||ln.v, fl.v
from cellvals fn
inner join cellvals ln
on fn.k='first name' and ln.k='last name' and fn.lineno = ln.lineno
inner join cellvals fl
on fl.k='favorite language' and fn.lineno = fl.lineno;

Data looks like this now:

full_name|language
Alice Smith|Ruby
Bob Smith|Python
Charlie Brown|JavaScript
...

And finally we run the query we wanted:

select language || ',' || group_concat(full_name, ',') from preferences group by language;

After all this:

$ ./csv.sql
C,Ed Jones
CSS,Gary Johnson,Mike Williams,Sally Williams
HTML,Frank Smith,Larry Jones,Ralph Smith,Zachary Jones
JavaScript,Charlie Brown,Heather Williams,Jack Jones,Karen Clark,Oscar Brown,Quinn Jones,Will Jones,Yvonne Clark
PHP,Daniel Clark,John Brown,Peter Clark,Xavier Brown
Python,Bob Smith
Ruby,Alice Smith,Ivan Smith,Nathan Smith,Tom Smith

The same as monster expression

Of course it would be more fun to do this without any temporary tables:

with
csv(data) as (
  select 'first name,last name,favorite language
Alice,Smith,Ruby
Bob,Smith,Python
Charlie,Brown,JavaScript
Daniel,Clark,PHP
Ed,Jones,C
Frank,Smith,HTML
Gary,Johnson,CSS
Heather,Williams,JavaScript
Ivan,Smith,Ruby
Jack,Jones,JavaScript
John,Brown,PHP
Karen,Clark,JavaScript
Larry,Jones,HTML
Mike,Williams,CSS
Nathan,Smith,Ruby
Oscar,Brown,JavaScript
Peter,Clark,PHP
Quinn,Jones,JavaScript
Ralph,Smith,HTML
Sally,Williams,CSS
Tom,Smith,Ruby
Will,Jones,JavaScript
Xavier,Brown,PHP
Yvonne,Clark,JavaScript
Zachary,Jones,HTML
'
),
lines(lineno, line) as (
  with recursive split(lineno, line, str) as (
    select 0, null, data from csv
    union all
    select
      lineno + 1,
      substr(str, 0, instr(str, char(10))),
      substr(str, instr(str, char(10))+1)
      from split where str != ''
  ) select lineno, line from split where line is not null
),
cells(lineno, colno, cell) as (
  with recursive split(lineno, colno, cell, str) as (
    select lineno, 0, null, (line||',') from lines
    union all
    select
      lineno,
      colno + 1,
      substr(str, 0, instr(str, ',')),
      substr(str, instr(str, ',')+1)
      from split where str != ''
  ) select lineno, colno, cell from split where cell is not null
),
cellvals(lineno, k, v) as (
  select c.lineno, h.cell, c.cell from cells h
  inner join cells c
  on c.colno = h.colno and h.lineno = 1 and c.lineno != 1
),
preferences(full_name, language) as (
  select fn.v||' '||ln.v, fl.v
  from cellvals fn
  inner join cellvals ln
  on fn.k='first name' and ln.k='last name' and fn.lineno = ln.lineno
  inner join cellvals fl
  on fl.k='favorite language' and fn.lineno = fl.lineno
)
select language || ',' || group_concat(full_name, ',') from preferences group by language;

Should you use SQLite?

Double Yes!

It's totally great for its intended purpose of having tiny databases embedded in your apps. It's far less hassle than setting up an SQL or NoSQL database server, is expressive enough, performant enough, and everyone knows basic SQL (even if of a different kind) so learning curve is very low.

As for doing programming in SQL, also yes. Unlike let's say CSS where "writing games in pure CSS" is a skill nearly orthogonal to regular use of making websites look pretty, with SQL all those silly exercises can help you when you write real queries.

Code

17