Exploring the Monorepo #4: Adding Docker + CI

Table Of Contents

We left off last time with a successful monorepo, and there was much rejoicing. By converting Typescript to Javascript we side-step all the complexity it takes to wire disparate Typescript projects together, but it also introduces important downsides we need to minimize or solve. The purpose of this article is to take a step back before diving into solutions for those downsides.

Why? Because the sample-project so far has some fundamental limitations, and if we don't address them now we risk converging on solutions that won't work in the real world. Our sample-project only runs code locally so it has nothing for packaging up the apps so they can run on a server, and it has no support for a continuous-integration pipeline to automate anything. If we don't address those requirements I'm really worried we could end up with a monorepo-solution that might look nice as an example but won't really work for anyone because we need more from our code than just running it locally.

What to Consider?

We need to:

  • Package web and api apps into a format that can be put on a server. I'll choose Docker for this as it's basically the de-factor standard these days, and it's easy to run the images locally to verify they would work if placed into a Docker-compatible environment.
  • Run a CI pipeline that builds and tests the code, and packages up the apps. I'll choose GitHub Actions for this because, well, honestly all the solutions are about the same 🤷‍♀️. The principles we align on will transfer just fine to whatever CI solution you prefer.

As always we're dealing with our pretend-product "webby", which we'll now extend with these two new concerns. Let's get to it!

ℹ️ BTW, if you just want to jump to the end-result you can browse the final branch result via VSCode on GitHub1s.com

Packaging With Docker

Docker is a curious case of being simple to start, and then to really nail its various details it can get very complex.

The challenge with Docker is making it build fast and lean, so it doesn't waste time and disk space building and installing the same things over and over. Maybe for our sample-product a small bit of waste will look benign, but scaled up those inefficiencies become very real problems so we'll want really optimal solutions.

Before we dive into Docker we have some decisions to make though:

  • Should we test as we build the Docker image? Or do we build the image and then run the tests inside it?
  • What should we do with libraries? We only need apps packaged because only apps runs on a server, but then how do we test those libraries?

We'll keep it simple for now, and we can get back to this later if it turns out to be bad ideas: Right now we'll test as we build because that way if the image builds we know the code works (by the definition of its tests at least!). And we'll also test libraries using Docker, even though they won't produce a runnable image it's simpler to run all our testing the way way.

Running libraries through Docker will also make the CI pipeline simpler, because it'll just use Docker for everything.

To get started we'll pick up from the previous article where we adopted the use of pnpm, and our repository was configured to build its projects to Javascript. Let's first add basic Docker to apps/web:

$ cd apps/web
$ cat Dockerfile
FROM node:16-alpine
RUN npm --global install pnpm
WORKDIR /root/monorepo
COPY ../.. . 
# ↑ Copy the whole repository and let pnpm filter what to run
RUN pnpm install --filter "@mono/web..."
RUN pnpm build --filter "@mono/web..."
RUN pnpm test --if-present --filter "@mono/web"
$ docker build . -t web
 => [4/6] COPY ../.. .                                                                                                                                 0.8s
 => ERROR [5/6] RUN pnpm install --filter "@mono/web..."
2.9s
------                                                                                                                                                      
 > [5/6] RUN pnpm install:                                                                                                                                  
#8 1.985 Progress: resolved 1, reused 0, downloaded 0, added 0
#8 2.441  ERROR  In : No matching version found for @mono/types@* inside the workspace

Whoops no that's not going to work: Docker can only see files in its context and the context by default is the folder the Dockerfile is in, so the COPY ../.. . step doesn't copy the repository root at all (it acts like COPY . .), so pnpm install fails because libs/types doesn't exist inside the Docker image.

So… how do we solve that? Should we move the Dockerfile to the repository root? No that's not acceptable, each project should be self-sufficient so it should also contain its packaging-file. So the Dockerfile must stay where it is.

The simplest solution I've found is one I learnt from @henrikklarup . It's perhaps not at first glance the easiest, but it fully decouples all this Docker-context stuff: We're going to give Docker a custom context by piping a tarball of files into it via stdin. Let's try it out:

$ cat Dockerfile
WORKDIR /root/monorepo
COPY . .
RUN pnpm install --filter "@mono/web..."
RUN pnpm build --filter "@mono/web..."
RUN pnpm test --if-present --filter "@mono/web"
WORKDIR /root/monorepo/apps/web
$ tar --exclude='node_modules' --exclude='dist' --exclude='.git' -cf - ../.. | docker build -f apps/web/Dockerfile - -t web
$ docker run --rm -it -p3000:3000 web
running on port 3000

Hey that worked! That's a mouthful of a tar command though, let's break it down:

  • We --exclude the folders "node_modules", "dist", and ".git" because they take up a lot of space that Docker shouldn't have to process.
  • -cf - ../.. are tar-arguments to create (-c) a tarball, from repository root (../..), and send it to stdout (f -).
  • | pipes the tarball to Docker
  • docker build -f <path> instructs Docker where to find the Dockerfile (because the context is now relative to the repository root we have to tell it which file to build), and the - lets Docker read context from stdin.

Does this solution look weird or complex? I've gotten so used to it I don't notice anymore, but I think it's a great decoupling that lets us generate the perfect context without getting limited by Docker… e.g. we could now replace the "tar" command with some tool that generates a perfectly optimized tarball. We don't really need to get that optimized right now though, but it's nice to know we can!

ℹ️ BTW, there's lots we could optimize: We include a lot of superfluous files, we should install dependencies before copying source-code, we should remove dev-dependencies after testing, and more! But it quickly gets messy setting all that up manually, so I hope by leaving it unoptimized here we can dive into more tool/script-assisted optimizations in later articles.

It's the same work to add Docker to apps/api and the libraries so no need to show that here, but you can explore the final result if you'd like.

Pipelining

For CI pipelines there is a simple golden principle to follow: CI should be nothing more than the glueing together of easy-to-run-locally scripts, because it's dangerously difficult to maintain a CI pipeline full of sophisticated logic and/or complex webs of rules. Inevitably some complexity leaks in to enable parallelization but let's tackle it one step at a time.

Let's start with a very simple CI pipeline:

$ cd ../..
$ cat .github/ci/ci.yml
name: CI

on:
  push:
  workflow_dispatch:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: pnpm/action-setup@v2
        with:
          version: 6.9.1
      - run: pnpm run -r --if-present --parallel docker:build

With these steps we run all available "docker-build" scripts and we get a green CI:
Green checkmark

The good news is this is certainly simple, and it's easy to follow what the CI does by running the same scripts locally. The bad news is it runs very slow: Every run of the CI builds each package on the same CI node, and those nodes aren't very powerful. So though it technically runs in parallel we really should let GitHub Actions parallelize the work for us! And each package gets its dependencies (re-)installed from scratch, and building and testing is run even if nothing has changed in that package. And all those Docker operations run without any sort of Docker-layer caching from previous runs. It's real bad.

As we did with Docker, let's optimize this a bit without getting totally lost in the weeds:

$ cat .github/ci/ci.yml
jobs:
  build:
    strategy:
      matrix:
        package: ["@mono/api", "@mono/web", "@mono/analytics", "@mono/logging", "@mono/types", ]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: pnpm/action-setup@v2
        with:
          version: 6.9.1
      - run: pnpm run --filter ${{ matrix.package }} docker:build

So with this our packages run in a matrix, which lets GitHub Actions run all of it in parallel:
Screen Shot 2021-07-17 at 10.05.56

It's annoying to manually maintain that list of packages though, how about we try one more optimization to see if we can generate that list dynamically?

$ cat package.json
  "scripts": {
    "list-packages": "echo [$(pnpm -s m ls --depth -1 | tr \" \" \"\n\" | grep -o \"@.*@\" | rev | cut -c 2- | rev | sed -e 's/\\(.*\\)/\"\\1\"/' | paste -sd, - )]",
  },
}
$ cat .github/ci/ci.yml
jobs:
  packages:
    outputs:
      matrix: ${{ steps.set-matrix.outputs.matrix }}
    steps:
      - id: set-matrix
        run: echo "::set-output name=matrix::{\"package\":$(pnpm -s list-packages)}"
  build:
    needs: packages
    strategy:
      matrix: ${{ fromJson(needs.packages.outputs.matrix) }}
    steps:
      - run: pnpm run --filter ${{ matrix.package }} docker:build

Now CI runs a "packages" job first that dynamically calculates the matrix.package variable, which then gets consumed in the build jobs. Wonderful!

That list-packages script is a bit of a terrifying shell-oneliner though, I think it's best we don't get into its details right now or we could be here all day. But if you'd like to see it explained or if you see a better way to do it please leave a comment.

The Bad

We introduced Docker and a CI pipeline, but also identified some issues we should be aware of:

  • We should only build what has changed, so untouched projects should be totally skipped.
  • Docker should use a persisted cache so if only source-code in e.g. apps/web has changed it shouldn't have to also reinstall its dependencies.
  • The custom Docker context should only include the files needed to build, and it should be easy (or fully automated) to control what files to exclude/include.
  • App-images should be pruned to only contain javascript and production-dependencies so the image we run on a server is as tiny and optimal as possible.

These issues are in addition to what we identified at the end of Attempt 3 - Build the source, and I'd like to spend future articles discovering monorepo-tools that can solve these issues.

Can you think of other issues or considerations we need to keep in mind? Leave a comment below with your thoughts or feedback.

35