Level Up Your dbt Schema Tests in 4 Easy Steps

You may be a data engineer or data quality engineer who uses the dynamic ETL tool dbt from Fishtown Analytics. If so, it's possible that you've incorporated a few automated schema tests into your dbt project. dbt's built-in testing framework is a powerful yet light way to validate data transformations, sources, and the data itself.

As you likely know, you can do even more with dbt and its related tools. Packages in dbt are a nice way to add useful and more complex functionality to your data transformation project with what are essentially dependencies.

dbt_utils

In this case, we'll focus on the dbt_utils package, which is an aptly named utilities package officially made and maintained by the same folks who develop dbt. I'll show you how to add the package if you don't already know how, and then share some examples of dbt_utils automated schema tests I find useful.

Note: One great advantage of dbt_utils schema tests is that they can test at the model or column level.

4 Easy Steps to Install

Let's do this!

1. Add Packages.yml to your root dbt folder

Create a new file at the root level in your dbt project, and name it packages.yml.

2. Add dbt_utils to the packages

Get the latest package code from this official page, then copy and paste it into your packages.yml file.

Today the code is:

packages:
  - package: fishtown-analytics/dbt_utils
    version: 0.7.0

Save your packages.yml file. It should look something like this:
dbt_utils in packages.yml

3. Run dbt deps locally

Open your Command Line or Terminal wherever you control your dbt project and send commands. Run the following command to install the dbt_utils package on your machine so you can use it.

dbt deps

You should see something like this:
dbt deps being installed on a PC
If so, success! The package is installed and you can go about using your new tests.

4. Use Your New Tests

Much like dbt's built-in testing, dbt_utils automated schema tests should be placed under a column or model being tested under the "tests" configuration. That's essentially how they are integrated!

dbt_utils Schema Tests

Now you can use a whole host of new dbt automated schema tests and even more in the dbt_utils package. For now I'll show you a couple of my favorite dbt_utils schema tests.

Recency - Model Level Test

Recency is a configurable schema test that checks to see if the timestamp column referenced was changed within the timeframe. I.E. you can see if the model loads every day, hour, week, etc.

models:
  - name: model_name
    tests:
      - dbt_utils.recency:
          datepart: day
          field: created_at
          interval: 1

Equal_Rowcount - Model Level Test

I find equal_rowcount to be a fantastic test for checking to make sure nothing changed in my sources during ingest or transformation. If I expect two models to contain the same number of rows, it's an easy reference to both from the target model.

models:
  - name: model_name
    tests:
      - dbt_utils.equal_rowcount:
          compare_model: ref('other_table_name')

Thanks for Reading!

I hope this was useful for you, and that you are excited by the prospect of adding more cool functions to your dbt project like I am! If you'd like to share more of your experience with dbt or data engineering, please feel free to comment or get in touch with me at any time. I have a dbtTestExamples repository on GitHub that is open for anyone to try their hand at dbt tests using Google BigQuery.

91