100 Languages Speedrun: Episode 32: Gherkin

Gherkin (or Cucumber, or a lot of vegetable names for different forks and variants of it) is a language that is used to describe test scenarios. It started in Ruby, and nowadays there are official and unofficial versions that support for a lot of different programming languages.

The idea is that actual application will be written in a real language, with all the complicated technical things, but test scenarios don't need any of that technical detail, so writing them in a special language like Gherkin enables non-technical (like the customer paying for the work, or end user) or semi-technical (like a business analyst, a web designer, or a domain expert) people to read and understand tests, and possibly even contribute to them. I'll get to how realistic this is eventually.

Ruby with RSpec makes testing very easy and readable already, so Ruby needs tools like Gherkin the least of all languages. It actually makes far more sense to use it in languages where testing DSLs are awkward and full of boilerplate. So let's do it all in Python (using behave package).

Feature File

Let's pip3 install behave and create this feature file feature/strings.feature:

Feature: String Functions
  Scenario: ASCII name
      Given name is "Lech"
       Then its length is 4
        And its uppercase is "LECH"
        And its lowercase is "lech"

  Scenario: Unicode name
      Given name is "Wałęsa"
       Then its length is 6
        And its uppercase is "WAŁĘSA"
        And its lowercase is "wałęsa"

  Scenario: Empty string
      Given name is ""
       Then its length is 0
        And its uppercase is ""
        And its lowercase is ""

Feature and Scenario are purely descriptive labels. The rest are "steps" and we need to implement them.

Step Definitions

The "steps" definitions is where all the technical details will be. Again, the whole idea is that the feature file like the one above is the one you can sit together with a non-technical or semi-technical person, and either write them together, or at least show it to them and hope they understand the scenarios.

If we run behave it will helpfully tell us about all the step definitions we did not provide.

The feature files are the same for all programming language, but of course step definitions are language specific. Depending on implementation, they are either regular expressions or some more convenient form that handles type conversion for us automatically. I'll use regular expression version here:

from behave import *

use_step_matcher("re")

@given('name is "(.*?)"')
def step_impl(context, name):
  context.name = name

@then('its length is (\d+)')
def step_impl(context, num):
  assert len(context.name) == int(num)

@then('its uppercase is "(.*?)"')
def step_impl(context, s):
  assert context.name.upper() == s

@then('its lowercase is "(.*?)"')
def step_impl(context, s):
  assert context.name.lower() == s

We can run it with behave. -T option skips printing timings, which are most of the time completely unnecessary:

$ behave -T
Feature: String Functions # features/strings.feature:1

  Scenario: ASCII name          # features/strings.feature:2
    Given name is "Lech"        # features/steps/strings.py:5
    Then its length is 4        # features/steps/strings.py:9
    And its uppercase is "LECH" # features/steps/strings.py:13
    And its lowercase is "lech" # features/steps/strings.py:17

  Scenario: Unicode name          # features/strings.feature:8
    Given name is "Wałęsa"        # features/steps/strings.py:5
    Then its length is 6          # features/steps/strings.py:9
    And its uppercase is "WAŁĘSA" # features/steps/strings.py:13
    And its lowercase is "wałęsa" # features/steps/strings.py:17

  Scenario: Empty string    # features/strings.feature:14
    Given name is ""        # features/steps/strings.py:5
    Then its length is 0    # features/steps/strings.py:9
    And its uppercase is "" # features/steps/strings.py:13
    And its lowercase is "" # features/steps/strings.py:17

1 feature passed, 0 failed, 0 skipped
3 scenarios passed, 0 failed, 0 skipped
12 steps passed, 0 failed, 0 skipped, 0 undefined
Took 0m0.001s

Reuse Feature files in Ruby

Let's do something interesting with the feature file. Let's just reuse it in a completely different language. Of course we'll need to rewrite step definitions completely, but let's start by copying features/strings.feature over to our Ruby implementation with no changes.

Let's run this (and in terminal it's all nicely color-coded):

$ cucumber
Feature: String Functions

  Scenario: ASCII name          # features/strings.feature:2
    Given name is "Lech"        # features/strings.feature:3
    Then its length is 4        # features/strings.feature:4
    And its uppercase is "LECH" # features/strings.feature:5
    And its lowercase is "lech" # features/strings.feature:6

  Scenario: Unicode name          # features/strings.feature:8
    Given name is "Wałęsa"        # features/strings.feature:9
    Then its length is 6          # features/strings.feature:10
    And its uppercase is "WAŁĘSA" # features/strings.feature:11
    And its lowercase is "wałęsa" # features/strings.feature:12

  Scenario: Empty string    # features/strings.feature:14
    Given name is ""        # features/strings.feature:15
    Then its length is 0    # features/strings.feature:16
    And its uppercase is "" # features/strings.feature:17
    And its lowercase is "" # features/strings.feature:18

3 scenarios (3 undefined)
12 steps (12 undefined)
0m0.026s

You can implement step definitions for undefined steps with these snippets:

Given('name is {string}') do |string|
  pending # Write code here that turns the phrase above into concrete actions
end

Then('its length is {int}') do |int|
# Then('its length is {float}') do |float|
  pending # Write code here that turns the phrase above into concrete actions
end

Then('its uppercase is {string}') do |string|
  pending # Write code here that turns the phrase above into concrete actions
end

Then('its lowercase is {string}') do |string|
  pending # Write code here that turns the phrase above into concrete actions
end

Oh that's convenient! behave also has similar output, but it's much less smart and lists 12 steps instead of figuring out it's really just 4 things.

So let's literally copy and paste this to features/step_definitions/strings.rb, and just fill in the gaps:

Given('name is {string}') do |string|
  @name = string
end

Then('its length is {int}') do |int|
  expect(@name.length).to eq(int)
end

Then('its uppercase is {string}') do |string|
  expect(@name.upcase).to eq(string)
end

Then('its lowercase is {string}') do |string|
  expect(@name.downcase).to eq(string)
end

And then it works just fine:

$ cucumber
Feature: String Functions

  Scenario: ASCII name          # features/strings.feature:2
    Given name is "Lech"        # features/step_definitions/strings.rb:1
    Then its length is 4        # features/step_definitions/strings.rb:5
    And its uppercase is "LECH" # features/step_definitions/strings.rb:9
    And its lowercase is "lech" # features/step_definitions/strings.rb:13

  Scenario: Unicode name          # features/strings.feature:8
    Given name is "Wałęsa"        # features/step_definitions/strings.rb:1
    Then its length is 6          # features/step_definitions/strings.rb:5
    And its uppercase is "WAŁĘSA" # features/step_definitions/strings.rb:9
    And its lowercase is "wałęsa" # features/step_definitions/strings.rb:13

  Scenario: Empty string    # features/strings.feature:14
    Given name is ""        # features/step_definitions/strings.rb:1
    Then its length is 0    # features/step_definitions/strings.rb:5
    And its uppercase is "" # features/step_definitions/strings.rb:9
    And its lowercase is "" # features/step_definitions/strings.rb:13

3 scenarios (3 passed)
12 steps (12 passed)
0m0.021s

Reuse Feature files in JavaScript

Are we done yet? Of course not. Let's reuse it in JavaScript.

With npm init -y; npm install --save-dev @cucumber/cucumber and editing package.json to make cucumber-js our test runner

{
  "name": "strings_javascript",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "cucumber-js"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "devDependencies": {
    "@cucumber/cucumber": "^8.0.0-rc.1"
  }
}

We can copy over features/strings.features over without any changes. If we run npm test, it gives us long list of steps we need to implement. It also figured out the patterns, but for some reason it printed each pattern 3 times:

$ npm test

> strings_javascript@1.0.0 test
> cucumber-js

UUUUUUUUUUUU

Failures:

1) Scenario: ASCII name # features/strings.feature:2
   ? Given name is "Lech"
       Undefined. Implement with the following snippet:

         Given('name is {string}', function (string) {
           // Write code here that turns the phrase above into concrete actions
           return 'pending';
         });

   ? Then its length is 4
       Undefined. Implement with the following snippet:

         Then('its length is {int}', function (int) {
         // Then('its length is {float}', function (float) {
           // Write code here that turns the phrase above into concrete actions
           return 'pending';
         });

   ? And its uppercase is "LECH"
       Undefined. Implement with the following snippet:

         Then('its uppercase is {string}', function (string) {
           // Write code here that turns the phrase above into concrete actions
           return 'pending';
         });

   ? And its lowercase is "lech"
       Undefined. Implement with the following snippet:

         Then('its lowercase is {string}', function (string) {
           // Write code here that turns the phrase above into concrete actions
           return 'pending';
         });

We need to do a bit of manual require here, but other than that, features/step_definitions/strings.js is very straightforward:

let { Given, Then } = require("@cucumber/cucumber")
let assert = require("assert")

Given('name is {string}', function (string) {
  this.name = string
})

Then('its length is {int}', function (int) {
  assert.equal(this.name.length, int)
})

Then('its uppercase is {string}', function (string) {
  assert.equal(this.name.toUpperCase(), string)
})

Then('its lowercase is {string}', function (string) {
  assert.equal(this.name.toLowerCase(), string)
})

Internationalization

One thing Gherkin does out of the box is support for different languages. As I don't expect everyone to know any specific languages other than English, I'll use LOLCATish (en-lol).

So let's rewrite the feature file in lolcat:

OH HAI: STRIN FUNCSHUNS
   MISHUN: BORIN WERD
     I CAN HAZ NAME "Kitteh"
           DEN LONGNEZ IZ 6
            AN HOOJ WERD IZ "KITTEH"
            AN SMOL WERD IZ "kitteh"

   MISHUN: FUNNY WERD
     I CAN HAZ NAME "Myszołap"
           DEN LONGNEZ IZ 8
            AN HOOJ WERD IZ "MYSZOŁAP"
            AN SMOL WERD IZ "myszołap"

   MISHUN: NO WERD
     I CAN HAZ NAME ""
           DEN LONGNEZ IZ 0
            AN HOOJ WERD IZ ""
            AN SMOL WERD IZ ""

And provide steps file - only regular expressions change, nothing else:

from behave import *

use_step_matcher("re")

@given('NAME "(.*?)"')
def step_impl(context, name):
  context.name = name

@then('LONGNEZ IZ (\d+)')
def step_impl(context, num):
  assert len(context.name) == int(num)

@then('HOOJ WERD IZ "(.*?)"')
def step_impl(context, s):
  assert context.name.upper() == s

@then('SMOL WERD IZ "(.*?)"')
def step_impl(context, s):
  assert context.name.lower() == s

We have to tell it that we want to use en-lol language:

$ behave -T --lang en-lol
OH HAI: STRIN FUNCSHUNS # features/strings.feature:1

  MISHUN: BORIN WERD         # features/strings.feature:2
    I CAN HAZ NAME "Kitteh"  # features/steps/strings.py:5
    DEN LONGNEZ IZ 6         # features/steps/strings.py:9
    AN HOOJ WERD IZ "KITTEH" # features/steps/strings.py:13
    AN SMOL WERD IZ "kitteh" # features/steps/strings.py:17

  MISHUN: FUNNY WERD           # features/strings.feature:8
    I CAN HAZ NAME "Myszołap"  # features/steps/strings.py:5
    DEN LONGNEZ IZ 8           # features/steps/strings.py:9
    AN HOOJ WERD IZ "MYSZOŁAP" # features/steps/strings.py:13
    AN SMOL WERD IZ "myszołap" # features/steps/strings.py:17

  MISHUN: NO WERD      # features/strings.feature:14
    I CAN HAZ NAME ""  # features/steps/strings.py:5
    DEN LONGNEZ IZ 0   # features/steps/strings.py:9
    AN HOOJ WERD IZ "" # features/steps/strings.py:13
    AN SMOL WERD IZ "" # features/steps/strings.py:17

1 feature passed, 0 failed, 0 skipped
3 scenarios passed, 0 failed, 0 skipped
12 steps passed, 0 failed, 0 skipped, 0 undefined
Took 0m0.001s

FizzBuzz

Obviously we need to do the FizzBuzz. Probably the best feature of Gherkin is convenient support for tables of examples.

Let's use this:

Feature: FizzBuzz
  Scenario: FizzBuzz Function
    Given FizzBuzz Input and Output
      | input   | output   |
      | 1       | 1        |
      | 2       | 2        |
      | 3       | Fizz     |
      | 4       | 4        |
      | 5       | Buzz     |
      | 6       | Fizz     |
      | 7       | 7        |
      | 8       | 8        |
      | 9       | Fizz     |
      | 10      | Buzz     |
      | 11      | 11       |
      | 12      | Fizz     |
      | 13      | 13       |
      | 14      | 14       |
      | 15      | FizzBuzz |
      | 16      | 16       |
      | 17      | 17       |
      | 18      | Fizz     |
      | 19      | 19       |
      | 20      | Buzz     |
    Then FizzBuzz returns the expected output

Step definitions for tables vary a lot between implementations, here's how Python behave would do this:

from fizzbuzz import fizzbuzz

@given("FizzBuzz Input and Output")
def step_impl(context):
  context.fizzbuzz_data = context.table

@then("FizzBuzz returns the expected output")
def step_impl(context):
  for input, output in context.fizzbuzz_data:
    assert fizzbuzz(int(input)) == output

I think with table data, Gherkin feature files have most advantage over typical testing frameworks, where such lists of test cases usually look much worse.

Should you use Gherkin?

I'd generally recommend against it.

I've seen one place where it worked as intended, with semi-technical people writing features, but the overwhelming consensus among people who tried it is that it's just too difficult to get non-technical or semi-technical people to get to write or even look at feature files, and for developers it's really just annoying, with much worse tooling support than regular testing frameworks.

Code