SQL query builders

My latest attitude towards the database has been to respect it more, by taking advantage of more of the powerful features it has available. That means taking advantage of features specific to my go-to database, PostgreSQL, and locking myself in. I have a pending blog post about my thoughts on that, but for now I wanted to make a quick note on something that seems so simple and obvious, yet I'd never considered before.

Quite often in our application we'll want to customise our WHERE clauses for filtering results. A common example of this is implementing filters for a table of results. To do this, we might loop over a list of filters that come to us from the client (e.g., query parameters in a GET request), and include all and only the filter values which have a value set. If we have one filter requested, then we should add a 'WHERE' clause, and if we have more than one, we should join all the options by 'AND'. It's a small thing to do, but in building up our SQL out of joining strings together, we need to be a little careful to construct it right.

Alternatively, when we're using an ORM or query builder (for Go, I like to use squirrel), adding the correct 'WHERE' and 'AND' terms is handled automatically. A small convenience that we could do ourselves, but then we'll find ourselves building our own in-house micro-query-builder.

The nice thing about not using a query builder or ORM is that you can work directly with the SQL, a great advantage when you're familiar with it and know just what you want to write. You don't run into situations where you think, "I know what the SQL query needs to be, but I don't know how to get this ORM/Query Builder to write that SQL". The problem is you end up composing your SQL queries using your own equivalent of a querybuilder anyway, and then the details of your queries are split all over the place. You can't easily look in one place, and see the query as a whole to understand it. Instead, it's scattered in piecemeal strings throughout your function or wider code.

I've been playing around with Rust some more, and looking at sqlx as an option that lets me stay close to the SQL, with some additional compile time checks. sqlx sends queries to the database at compile time to check that they are valid, which sounds great -- compile time checks for validity save time.

This, of course, won't work when you're constructing your clauses from parts and therefore don't have the whole query written anywhere at compile time that can be checked against the database. For example (written in Go), you might have:

package main

import (
    "fmt"
    "strings"
)

var filters = []struct {
    Name  string
    Value interface{}
}{
    {
        Name:  "name",
        Value: "Homer",
    },
    {
        Name:  "city",
        Value: "Springfield",
    },
}

func main() {
    query := "SELECT * FROM person"

    var where []string
    var args []interface{}
    index := 0

    // Make sure 'filters' are not provided by the client, or are checked against a list of allowed values, so that you don't give an opening for a SQL injection attack:
    for _, filter := range filters {
        index++

        where = append(where, fmt.Sprintf("%s = $%d", filter.Name, index))
        args = append(args, filter.Value)
    }

    if len(where) > 0 {
        query = query + " WHERE " + strings.Join(where, " AND ")
    }

    fmt.Println(query)

    // db.Query(query, args...)
    _ = args
}

We've built our own home brewed SQL building solution, and this example doesn't even deal with ranges -- for example, finding everyone with a date of birth between two dates (or just after a particular date). Those have a different syntax that also needs to be dealt with, complicating our 'quick' homebrewed query builder. Note also that the full query is scattered across parts throughout this function. The query it generates In this case would be:

SELECT * FROM person WHERE name = $1 AND city = $2

But it's very hard to see that at a glance, and it can't be checked in advance. How do we get compile time checks for these runtime constructed queries? I've been thinking about options where we inspect the database to understand what tables and fields there are, and inside Rust we use a macro where we can use simple tags inside our query string that the compiler uses to check are valid values. We could use this to check that we're at least using correct table and field names.

However, I saw another solution for this in an issue for the sqlx project. And that solution is just to have the query itself handle the optional cases! Instead of including them only if they're set, you always list them in the query inside your code, so you don't need to do any kind of runtime building up of the conditions. If the filter is not provided then the query won't filter on that value. Simple, but effective, and should still let sqlx check the query at compile time! And the query is fully expressed, so it's easy to read all in one place, and therefore easy to understand.

Here's an updated version of the above using this method:

package main

import (
    "database/sql"
)

type filter struct {
    Name      sql.NullString
    City      sql.NullString
    DOBBefore sql.NullTime
    DOBAfter  sql.NullTime
}

func main() {
    var f filter
    f.Name = sql.NullString{String: "Homer", Valid: true}
    f.City = sql.NullString{String: "Springfield", Valid: true}

    query := `
SELECT * FROM person
WHERE 
    ($1 IS NULL OR name = $1) AND
    ($2 IS NULL OR city = $2) AND
    ($3 IS NULL OR dob < $3) AND
    ($4 IS NULL OR dob > $4)
`

    rows, err := db.Query(query, f.Name, f.City, f.DOBBefore, f.DOBAfter)

    [...]
}

This is both shorter and much clearer about what's going on. Morever, that query can be checked at compile time with the database, and is easily portable. It even handles date range filters without needing to enhance our quickly built in-house query builder to consider such filters. Also, since we've hard-coded our filter options, we don't run the risk of accidentally allowing any SQL injection attacks because we forgot to properly check our filter names.

It's such a simple and obvious way to handle these things, but it never occurred to me before. I'll be interested to see if this kind of solution can work in most situations, or if it will have common limitations I haven't yet considered.

28