How to backup Hashicorp Vault with Raft storage on Kubernetes

Context
Our team is experimenting with Hashicorp Vault as our new credentials management solution. Thanks to the offical Vault Helm Chart, we are able to get an almost production-ready vault cluster running on our Kubernetes cluster with minimal effort.
Architecture
Our 5-node vault cluster is highly available by using the provided Integrated Storage Raft backend. The vault cluster is run as a Kubernetes StatefulSet and each node has its own data storage. Each data storage is powered by a Block Storage on IBM Cloud via PersistentVolumeClaim.
The Problem
Unfortunately, the open source vault does not provide an out-of-the-box automated backup solution. It is only offer in Vault Enterprise. Apprently, our team doesn't have a deep pocket to pay for the license fee.
That said, the backup feature is still accessible from cli and HTTP API, just not automated. We utilize the snapshot save from vault cli to perform automated backup using a CronJob running along with the vault Kubernetes deployment. The cronjob will periodically take a snapshot of the vault cluster and upload to our S3 storage.
How?
Prerequisite
  • You have a working vault cluster
  • You have sufficient access to the cluster
  • You have a working S3 storage instance
  • Setup Policy and Authentication
    Create a minimal policy for our snapshot agent to perform the backup job.
    echo '
    path "sys/storage/raft/snapshot" {
       capabilities = ["read"]
    }' | vault policy write snapshot -

    The approle auth method allows machines or apps to authenticate with Vault-defined roles.

    vault auth enable approle
    vault write auth/approle/role/snapshot-agent token_ttl=2h token_policies=snapshot
    vault read auth/approle/role/snapshot-agent/role-id -format=json | jq -r .data.role_id
    vault write -f auth/approle/role/snapshot-agent/secret-id -format=json | jq -r .data.secret_id
    Prepare Secrets
    Let's save all our sensitive information as Secrets. We will use them later.
    apiVersion: v1
    kind: Secret
    metadata:
      name: vault-snapshot-agent-token
    type: Opaque
    data:
      # we use gotmpl here
      # you can replace them with base64-encoded value
      VAULT_APPROLE_ROLE_ID: {{ .Values.approle.secretId | b64enc | quote }}
      VAULT_APPROLE_SECRET_ID: {{ .Values.approle.secretId | b64enc | quote }}
    apiVersion: v1
    kind: Secret
    metadata:
      name: vault-snapshot-s3
    type: Opaque
    data:
      # we use gotmpl here
      # you can replace them with base64-encoded value
      AWS_ACCESS_KEY_ID: {{ .Values.backup.accessKeyId | b64enc | quote }}
      AWS_SECRET_ACCESS_KEY: {{ .Values.backup.secretAccesskey | b64enc | quote }}
      AWS_DEFAULT_REGION: {{ .Values.backup.region | b64enc | quote }}
    The CronJob
    Let's create the CronJob that actually does the work.
    We configure VAULT_ADDR environment variable to http://vault-active.vault.svc.cluster.local:8200. Using vault-active Service can make sure the snapshot request is made against the leader node, assuming you have enabled Service Registration, which is the default. The exact url may vary depending on your vault helm chart deployment release name and targer namespace, learn more.

    I may have over-engineered the cronjob by using multiple containers to perform a simple backup and upload task. The intention is to avoid building custom images and I don't want to maintain yet another image.

    apiVersion: batch/v1
    kind: CronJob
    metadata:
      name: vault-snapshot-cronjob
    spec:
      schedule: "@every 12h"
      jobTemplate:
        spec:
          template:
            spec:
              volumes:
              - name: share
                emptyDir: {}
              containers:
              - name: snapshot
                image: vault:1.7.2
                imagePullPolicy: IfNotPresent
                command:
                - /bin/sh
                args:
                - -ec
                # The offical vault docker image actually doesn't come with `jq`. You can 
                # - install it during runtime (not a good idea and your security team may not like it)
                # - ship `jq` static binary in a standalone image and mount it using a shared volume from `initContainers`
                # - build your custom `vault` image
                - |
                  curl -sS https://webinstall.dev/jq | sh
                  export VAULT_TOKEN=$(vault write auth/approle/login role_id=$VAULT_APPROLE_ROLE_ID secret_id=$VAULT_APPROLE_SECRET_ID -format=json | /jq/jq -r .auth.client_token);
                  vault operator raft snapshot save /share/vault-raft.snap; 
                envFrom:
                - secretRef:
                    name: vault-snapshot-agent-token
                env:
                - name: VAULT_ADDR
                  valut: http://vault-active.vault.svc.cluster.local:8200
                volumeMounts:
                - mountPath: /share
                  name: share
              - name: upload
                image: amazon/aws-cli:2.2.14
                imagePullPolicy: IfNotPresent
                command:
                - /bin/sh
                args:
                - -ec
                # the script wait untill the snapshot file is available
                # then upload to s3
                # for folks using non-aws S3 like IBM Cloud Object Storage service, add a `--endpoint-url` option
                # run `aws --endpoint-url <https://your_s3_endpoint> s3 cp ...`
                # change the s3://<path> to your desired location
                - |
                  until [ -f /share/vault-raft.snap ]; do sleep 5; done;
                  aws s3 cp /share/vault-raft.snap s3://vault/vault_raft_$(date +"%Y%m%d_%H%M%S").snap;
                envFrom:
                - secretRef:
                    name: vault-snapshot-s3
                volumeMounts:
                - mountPath: /share
                  name: share
              restartPolicy: OnFailure
    Wrapping Up
    Now you have all resources needed to automate vault backup for Raft backend. You can either just run kubectl apply -f * or build your own Helm Chart and distribute on your private chart repository.

    28

    This website collects cookies to deliver better user experience

    How to backup Hashicorp Vault with Raft storage on Kubernetes