A look at PostgreSQL migrations in Node

Data migrations are a common operation for any application with a persistence layer. In this post we will look at how to set a simple data migration on a PostgreSQL database in a Node back-end.

There are already existing libraries that provide the needed level of abstraction to perform data migrations with the above stack, such as node-pg-migrate, postgres-migrations or db-migrate. If you need to run migrations in production, I would encourage using any of those instead of coming up with your own solution.

Yet, let's look at what it takes to build such a library!

Keeping track of migrations

At its core, migrations are just SQL files being executed at most once in a certain order. It is primordial that migrations are only ever run once, and exactly once. To achieve that, we need to keep track of which migrations have already been run when triggering a migration.

As we are already using a persistence layer, one straightforward approach is to use that same persistence layer to keep track of the migrations that have been run. In our case, we can create a migrations table in PostgreSQL, which will be updated on every migration run ... and, of course, we will be setting that up using a migration!

00000001-init-migrations.sql

-- Migrations
-- This table will keep a list of migrations that have been run on this database.
--
CREATE TABLE IF NOT EXISTS migrations (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  file TEXT,
  created_at TIMESTAMP DEFAULT NOW()
);

Some migrations might need to run on certain assumptions (for example, that a table exists). For those cases, we enforce an order for migrations to run. As such, we prefix all migration files with 8 digits.

Writing migrations

As we ideally want to get a working end-to-end system from scratch, the next logical step is to create the needed tables for our application. For example:

00000002-init-basic-schema.sql

-- Users
-- This table will hold the users' information.
CREATE TABLE IF NOT EXISTS users (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email TEXT,
  created_at TIMESTAMP DEFAULT NOW()
);

-- Settings
-- This table holds all users' settings
CREATE TABLE IF NOT EXISTS settings (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  lang TEXT,
  currency TEXT,
  user_id UUID REFERENCES users (id),
  created_at TIMESTAMP DEFAULT NOW()
);

...

With this setup, migrations are just SQL files doing any sort of operation like creating tables, inserting data, deleting columns, adding indexes etc...

00000003-add-test-data.sql

CREATE TABLE IF NOT EXISTS test (
  name TEXT
);

INSERT INTO test (name) VALUES ('bonjour'), ('hola'), ('nihao');

You might have noticed that we are not supporting "down" migrations. This is by design, as "down" migrations can be thought of as "up" migrations negating a previous migration. In that sense, they are just simply migrations too.

Running migrations

Now for the trickiest part of this exercise, let's see how to run those migration files!

For this section, we will assume we have implemented a similar getClient() method as described in https://node-postgres.com/guides/project-structure using pg.

The first thing we want to do, is check if there are any outstanding migrations to be run, and if so, read the content of those migrations. To do so, we will introduce a utility function:

async function getOutstandingMigrations(migrations = []) {
  const files = await promisify(fs.readdir)(__dirname);
  const sql = await Promise.all(
    files
      .filter((file) => file.split(".")[1] === "sql")
      .filter((file) => !migrations.includes(file))
      .map(async (file) => ({
        file,
        query: await promisify(fs.readFile)(`${__dirname}/${file}`, {
          encoding: "utf-8",
        }),
      }))
  );

  return sql;
}

In a nutshell, what this function does is read all files in the current directory and filter out files that don't contain SQL and previously ran migrations. Finally, it reads the content of those files. Note that we are using promisified a few fs utility functions to increase the efficiency of this function.

We can now use that function to get all outstanding migrations (i.e. migrations that haven't run yet against the current database) in our migrate() function like follows:

async function migrate() {
  ...

  // Check previous migrations
  let existingMigrations = [];
  try {
    let result = await client.query("SELECT * FROM migrations");
    existingMigrations = result.rows.map(r => r.file)
  } catch {
    console.warn("First migration");
  }

  // Get outstanding migrations
  const outstandingMigrations = await getOutstandingMigrations(
    existingMigrations
  );

  ...
}

Now that we have a list of outstanding migrations, we want to run migrations sequentially in transactions. The order here is important as some migrations might depend on artifacts created in previous migrations. Running each migration in a transaction helps rollback a specific migration if there are any issues.

async function migrate() {
  const client = await getClient();

  ...
  try {
    // Start transaction
    await client.query("BEGIN");

    // Run each migration sequentially in a transaction
    for (let migration of outstandingMigrations) {
      // Run the migration
      await client.query(migration.query.toString());
      // Keep track of the migration
      await client.query("INSERT INTO migrations (file) VALUES ($1)", [
        migration.file,
      ]);
    }

    // All good, we can commit the transaction
    await client.query("COMMIT");
  } catch (err) {
    // Oops, something went wrong, rollback!
    await client.query("ROLLBACK");
  } finally {
    // Don't forget to release the client!
    client.release();
  }
}