Doctoring your application configuration

Opinion: Code for configuration handling is ugly

If you've ever written a statement that looks something like my_setting = config.value if config.value else some_default and hated it, then I expect we can agree that handling application configuration is a miserable thing to code for. Its boring to write, ugly to read, and annoying to ensure that all user configuration options have sane default values.

The argparse library can go a long way to making things better, but if you need to set dozens of options, or if your application supports plugins with configuration requirements unknown to the main program, it becomes harder to simply add_argument our way back to sanity.

In these cases, it makes more sense to opt for a configuration file. Also, it would be nice if our application code didn't care about the configuration file at all, where we had a single source of truth to count on being correct.

Getting Classy

When I've hit the point in my development process where I have the basic functionality working and I've settled on the basics of how the program will flow, I like to create a new python file constfig.py and define the class _C where the 'C' stands for both "constants" and "config" (🎶 and that's good enough for me. Both config and constants start with C 🎶). By convention, I lead the with an underscore to signal to the other developers on the project that the class isn't intended to be implemented directly, and at the end of the file I create an instance of _C() called C which can be imported and will contain all the information needed by the application.

For example, let's implement a simple dice rolling API endpoint using Flask which returns a JSON formatted string. For this type of application we would want to be able to easily configure the IP address and port that the service listens on, so let's define our _C class, establish the variable names, pre-populate the variables with some reasonable default values, and then create an instance of our class named C which the user is meant to import from our file.

class _C(object):
    def __init__(self):
        self.LISTEN_IP = '0.0.0.0'
        self.LISTEN_PORT = 8080

C = _C()

We also have certain values which will never change at runtime, but if our specification changes later, we don't want to have to hunt down all instances of that value in our code, so let's also add our constant values.

class _C(object):
    def __init__(self):
        # Constant values
        self.JSON_RESPONSE_KEYWORD_D6ROLL = 'd6_roll'

        # User configurable values
        self.LISTEN_IP = '0.0.0.0'
        self.LISTEN_PORT = 8080

C = _C()

Now that we have our constant values and our configurable variables with sane defaults, let's import C (the instance of _C) into our main Flask application.

from constfig import C  # Our constants + config = constfig
from random import randint
from flask import Flask, jsonify

app = Flask(__name__)

@app.route('/rolld6')
def roll_d6():
    value = {C.JSON_RESPONSE_KEYWORD_D6ROLL: randint(1,6)}
    return jsonify(value)

if __name__ == '__main__':
    app.run(host=C.LISTEN_IP, port=C.LISTEN_PORT)

We can now give it a quick run:

user@host:~ $ python3 roll.py 
 * Serving Flask app "roll" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
INFO:werkzeug: * Running on http://0.0.0.0:8080/ (Press CTRL+C to quit)

...and test to make sure all looks good so far.

user@host:~ $ curl -s localhost:8080/rolld6 | python3 -m json.tool
{
    "d6_value": 3
}

🧑‍⚕️ Operate on your self

Now that we have our application and our _C class, we're ready for it to poke at its own guts.

Let's add the function load_config which will open up our configuration file config.yaml, and then use the setattr function to update our own values at startup (when Python runs from constfig import C). Python's ability to alter its own state is knows as "reflection" or "reflective programming".

import yaml
import logging


class _C(object):
    def __init__(self):
        # Constant values - and a "gotcha!"
        self.JSON_RESPONSE_KEYWORD_D6ROLL = 'd6_roll'

        # User configurable values
        self.LISTEN_IP = '0.0.0.0'
        self.LISTEN_PORT = 8080

        # Load user config (override defaults above)
        self.load_config()

    def load_config(self):
        try:
            config_file = open('config.yaml', 'r')
            config_string = config_file.read()
            config_file.close()

            configuration = yaml.load(config_string, Loader=yaml.SafeLoader)  # Don't handle badly formatted YAML. Let the parser inform the user of the error.
            if isinstance(configuration, dict):
                for variable_name, value in configuration.items():
                    setattr(self, variable_name, value)
            else:
                raise yaml.scanner.ScannerError(f'The file config.yaml should be structured as type dict, but got type {type(configuration)}')
        except FileNotFoundError:
            logging.warning('Configuration file config.yaml is missing. Using default values.')


C = _C()

We can now create our config.yaml file containing key:value pairs where the key matches the name of our configuration items

LISTEN_IP: 127.0.0.1
LISTEN_PORT: 32000

Let's fire up the service again...

user@host:~ $ python3 roll.py 
 * Serving Flask app "roll" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:32000/ (Press CTRL+C to quit)

...and this time we see that the default values have been overridden by those in the configuration file. Specifically, we're now listening on the loopback IP, and that our port number has changed as expected.

Gotta' catch that "gotcha!"

The problem with this method is that we open ourselves up to having the configuration file change values that should not be changed. For example, if we set JSON_RESPONSE_KEYWORD_D6ROLL: this is bad in config.yaml then make a request to our endpoint, we see that we've indirectly altered our application's response.

user@host:~ $ curl -s http://127.0.0.1:32000/rolld6 | python3 -m json.tool
{
    "this is bad": 4
}

While the fix here isn't hard, this bad behavior underscores the importance of the order in which we set out values. Let's move our constant values to where they are being set after the values in our configuration file by adding a finally clause to our try/except block on load_config() and call our new method set_constants().

import yaml
import logging


class _C(object):
    def __init__(self):
        # User configurable values
        self.LISTEN_IP = '0.0.0.0'
        self.LISTEN_PORT = 8080

        # Load user config (override defaults above)
        self.load_config()

    def set_constants(self):
        # Constant values
        self.JSON_RESPONSE_KEYWORD_D6ROLL = 'd6_value'


    def load_config(self):
        try:
            config_file = open('config.yaml', 'r')
            config_string = config_file.read()
            config_file.close()

            configuration = yaml.load(config_string, Loader=yaml.SafeLoader)  # Don't handle badly formatted YAML. Let the parser inform the user of the error.
            if isinstance(configuration, dict):
                for variable_name, value in configuration.items():
                    setattr(self, variable_name, value)
            else:
                raise yaml.scanner.ScannerError(f'The file config.yaml should be structured as type dict, but got type {type(configuration)}')
        except FileNotFoundError:
            logging.warning('Configuration file config.yaml is missing. Using default values.')
        finally:
            self.set_constants()


C = _C()

...then launch again using our "bad" (which tries to set JSON_RESPONSE_KEYWORD_D6ROLL) configuration file.

user@host:~ $ python3 roll.py 
 * Serving Flask app "roll" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:32000/ (Press CTRL+C to quit)

...and of course test the result.

user@host:~ $ curl -s http://127.0.0.1:32000/rolld6 | python3 -m json.tool
{
    "d6_value": 5
}

Tada! A single source of truth for your Python application!

You hate this, but I need validation

Yes, I hear you bemoan, "I've got a children's song about cookies stuck in my head, I'm half way through a pack of Oreos, and this seems like an abstraction that makes it harder to see how the configuration is loaded."

Yeah, this solution is probably not for everyone, but I've hidden the superpower of this method behind a mild mannered alter-ego.

The real power is automatic validation of your configuration simply by importing C. To do this, let's define the function validate_config(), write some basic assertions to validate the configuration, and then call the function after we have successfully loaded the yaml file and set the constant values in our class.

import yaml
import logging


class _C(object):
    def __init__(self):
        # Default values for user configurable items
        self.LISTEN_IP = '0.0.0.0'
        self.LISTEN_PORT = 8080

        # Load user config (override defaults above)
        self.load_config()

    def set_constants(self):
        # Constant values
        self.JSON_RESPONSE_KEYWORD_D6ROLL = 'd6_value'

    def load_config(self):
        try:
            config_file = open('config.yaml', 'r')
            config_string = config_file.read()
            config_file.close()

            configuration = yaml.load(config_string, Loader=yaml.SafeLoader)  # Don't handle badly formatted YAML. Let the parser inform the user of the error.
            if isinstance(configuration, dict):
                for variable_name, value in configuration.items():
                    setattr(self, variable_name, value)
            else:
                raise yaml.scanner.ScannerError(f'The file config.yaml should be structured as type dict, but got type {type(configuration)}')
        except FileNotFoundError:
            logging.warning('Configuration file config.yaml is missing. Using default values.')
        finally:
            self.set_constants()
            self.validate_config()  # Validate our config file

    def validate_config(self):
        # Validate LISTEN_IP
        assert isinstance(self.LISTEN_IP, str), 'LISTEN_IP is not a string value'
        assert len(self.LISTEN_IP.split('.')) == 4, 'LISTEN_IP has an unexpected number of octets'
        assert all([ip.isnumeric() for ip in self.LISTEN_IP.split('.')]), 'LISTEN_IP is not a valid IP address.'
        assert all([0 <= int(ip) <= 255 for ip in self.LISTEN_IP.split('.')]), 'LISTEN_IP is not a valid IP address.'

        # Validate LISTEN_PORT
        if isinstance(self.LISTEN_PORT, str):
            assert self.LISTEN_PORT.isnumeric(), 'LISTEN_PORT must be a whole number.'
            self.LISTEN_PORT = int(C.LISTEN_PORT)
        assert 999 < self.LISTEN_PORT < 65536, 'LISTEN_PORT is outside expected range.'


C = _C()

...and just as an example, let's put a deliberate typo in our configuration file.

LISTEN_IP: 127.0.0.1x  # My fingers are fat :-(
LISTEN_PORT: 32000

Now, have a couple of opportunities to validate our configuration.

For example, at runtime...

user@host:~ $ python3 roll.py 
Traceback (most recent call last):
  File "roll.py", line 1, in <module>
    from constfig import C  # Our constants + config = constfig
  File "/Users/adrong/PycharmProjects/constfig/constfig.py", line 33, in <module>
    validate_config()
  File "/Users/adrong/PycharmProjects/constfig/constfig.py", line 14, in validate_config
    assert all([a.isnumeric() for a in C.LISTEN_IP.split('.')]), 'LISTEN_IP is not a valid IP address.'
AssertionError: LISTEN_IP is not a valid IP address.

... or run constfig.py directly to validate your configuration in your test or deployment pipelines.

user@host:~ $ python3 constfig.py 
Traceback (most recent call last):
  File "constfig.py", line 33, in <module>
    validate_config()
  File "constfig.py", line 14, in validate_config
    assert all([a.isnumeric() for a in C.LISTEN_IP.split('.')]), 'LISTEN_IP is not a valid IP address.'
AssertionError: LISTEN_IP is not a valid IP address.

Small, but mighty

This is a relatively small amount of code that, with some tweaks, can source configuration from a database, from environment variables, from command line arguments, and can have validation code that can reconcile configuration from any combination of those sources. This pattern for handling configuration has enabled me to quickly create configuration handling in a standardized way, across multiple tools , that members of other teams have found approachable and easy to manage.

18