Testing Cloud Functions with functions-framework in python

Google Cloud Functions provide a fully-managed, serverless way to deploy simple apps and run short, scheduled operations. As with most things in Google Cloud, CF have so much topics to cover, one article isn't enough to even scratch the surface. So instead, let me use this post to tackle a specific CF topic: testing.

The evolution of our function

I was tasked with refactoring the company's entire BI system, which meant setting up some data pipelines and using APIs. At first, it was one source, triggered daily with Cloud Scheduler via Cloud Pub/Sub. The message, which then read only "foo", was published to a Pub/Sub topic, which had a function run() inside a main.py module at the other end, waiting to be instantiated and do its thing.

The need for input arguments

It soon became obvious that the message triggering the function would need to actually contain some information, such as the schedule that was being run. This was covered by CF docs, so no biggie. Then came another source that needed to be refactored, and with it - a slight issue. CF in python allow only one method inside a module named main.py to be the endpoint. Which meant my run() function now had to become a relay function, calling other functions according to the message's payload. A preview of that is visible in the cover image.

The need for environment variables

When you first write your CF-to-be code, you may use your code to directly store sensitive information, like tokens and API keys, but once you get to committing this code to a repo, you need other solutions. What I did is I put my sensitive information in some JSON files and made git ignore them, so they wouldn't be published in a repo. But since my CF are deployed from the company's GitHub repo, the deployed functions couldn't read those secrets as well. So, naturally, I turned to environment variables, which you can define while editing your CF. They work as you'd expect them to, with their values being easily accessible via the python os module, like so:

BAMBOO_API_TOKEN = os.environ.get('BAMBOO_API_TOKEN')
BIGQUERY_PROJECT = os.environ.get('BIGQUERY_PROJECT')

These two requirements made local testing unappealing, to say the least.

Enter functions-framework...

Google's functions-framework module sets up a local instance of your CF, but it doesn't run it automatically. You need to send a message, a POST request with a very specific format to get it to run your code. This also meant writing a handler for the message that would trigger a specific local run of your CF, so local-source-1-daily, local-source-1-weekly, local-source-2-all etc. A bit tedious, but it works as it should, as long as there are no environment variables. For some reason, you can't simply set up environment variables locally and then retrieve them inside the CF once it is run. The solution is to generate a file, simply called .env, that stores the variables you want. Thankfully, this also has been packaged for you in python.

Setting up the environment

I created a function, setup-functions-fmwk.py, that takes in positional arguments and generates a .env file.

from sys import argv
import json

# Arguments check
if len(argv) < 4:
    print("Not all arguments entered.")
    print("Args: ", ' '.join(argv))
    print("Exiting function.")
    exit()

# Some additional checks...

project = argv[1]
environment = argv[2]
runMessage = argv[3]

# Get API key
fp = 'api-key.json'
try:
    with open(fp) as f:
        j = json.load(f)
except FileNotFoundError:
    print("File `{}` not found.".format(fp))
    print("Exiting function.")
    exit()
apiKey = j['apiKey']

# Set up environment variables
with open('.env', 'w') as env:
    env.write('GOOGLE_APPLICATION_CREDENTIALS="service-account.json"\n')
    env.write('PROJECT="{}"\n'.format(project))
    env.write('API_KEY="{}"\n'.format(apiKey))
    env.write('ENVIRONMENT="{}"\n'.format(environment))
    env.write('RUN_MESSAGE="{}"\n'.format(runMessage))

print("Setup complete.")
exit()

I've decided to pass the runMessage as an EV, so as to have only one local function handler. The endpoint looks like this:

def run(event, context):
    '''
    Endpoint function triggered by Pub/Sub that runs a specified API.

    Args:
        event (dict) - The dictionary with data specific to this type of event.
        context (google.cloud.functions.Context) - The Cloud Functions event metadata.
    '''
    import base64
    import bambooAPI
    import pipedriveAPI
    import hnbAPI
    import gscostAPI
    from dotenv import load_dotenv
    import os

    print("This cloud function was triggered by messageId {} published at {}.".format(context.event_id, context.timestamp))

    if 'data' in event:
        pubsubMessage = base64.b64decode(event['data']).decode('utf-8')
        print("Cloud function trigerred with payload `{}`.".format(pubsubMessage))
        if pubsubMessage == 'pipedrive-legacy-run-hourly':
            print("Starting `pipedriveAPI.run_legacy_mode('hourly')`")
            pipedriveAPI.run_legacy_mode('hourly')
        # Other run options ...
        elif pubsubMessage == 'local':
            print("Loading local environment.")
            load_dotenv()
            env_run_msg = os.getenv("RUN_MESSAGE") 
            if env_run_msg == 'pipedrive-legacy-run-hourly':
                pipedriveAPI.run_legacy_mode('hourly')
            # Other run options ...
        else:
            print("Unknown payload. Terminating cloud function.")
    else:
        print("No payload in the Pub/Sub message. Terminating cloud function.")

As you can see, once the .evn has been written with key=value pairs, it is loaded simply with load_dotenv() module. You can then use os, as we did above, to load the variables.

The only thing left to do is to send the trigger. Google docs explain how to do this with curl, but it's a messy statement, certainly not something you will remember unless you test your CF 20+ times a day, so why not do a python function to generate the message? I called mine run-functions-fmwk.py:

from sys import argv
import requests
import base64
import datetime

# Set up curl call
# Check endpoint

msg = 'local'
ts = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")
data = base64.b64encode(msg.encode('utf-8')).decode('utf-8')
print("Encoded message: {}".format(data))

d = {
    "context": {
        "eventId":"1144231683168617",
        "timestamp":"{}".format(ts),
        "eventType":"google.pubsub.topic.publish",
        "resource": {
            "service":"pubsub.googleapis.com",
            "name":"projects/bornfight-projects/topics/gcf-test",
            "type":"type.googleapis.com/google.pubsub.v1.PubsubMessage"
        }
    },
    "data": {
        "@type": "type.googleapis.com/google.pubsub.v1.PubsubMessage",
        "attributes": {
             "attr1":"attr1-value"
        },
        "data":"{}".format(str(data))
    }
}
url = 'http://127.0.0.1:8080'
h = {"Content-Type": "application/json"}

r = requests.post(url, headers=h, json=d)

print("Response {}".format(r.status_code))

exit()

One thing to note here is the address of the post request. By default, functions-framework listens to port 8080 at your localhost, so if you need to change the port, you can pass it in as an argument to the function call.

Final run

You need two terminal sessions to do this, one to run the functions-framework, the other to send the message.

In terminal 1, run:

python3 setup-functions-fmwk.py bi-project test-environment pipedrive-legacy-run-hourly

to setup the environment, then call functions framework:

functions-framework --target run --signature-type event

The target is the name of the endpoint, signature-type is event, unless your function is dealing with webhooks, then it's http, but you also need to change the way your function consumes the data. There are some other options, such as the port which is used to listen to incoming requests, but I will not go into those.

Finally, to send the message, go to terminal 2 and run:

python3 run-functions-fmwk.py

The CF returns status 200 if it finishes successfully, 500 if it crashes, but of course, you are encouraged to do simple debugging by printing to stdout to make your life easier.

The functions-framework doesn't stop running when the function finishes, so you need to interrupt it with Ctrl+C back in terminal 1.

It took me a while to figure all this out, hope it saves you some time :)

28