19
Structured logging in Python
Creating usable logs in Python can be somewhat complex—just ask my boss.
The secret to logging in Python is to give up forever and become a farmer.12:29 PM - 13 May 2020
Jonan probably isn’t the only dev who wants to run off to the countryside to become an agriculturalist because of Python’s logging module. It's an incredibly powerful and flexible module, but this flexibility comes at the cost of complexity. If you find it confusing, you're not alone.
In this article, we'll look at how you can get started with logging in Python, why you should think of logs as events, not messages, and how you can make your log data easier to integrate and explore, without the need for nasty regexes.
The most basic Python logger consists of just a few lines of code.
import logging
logger = logging.getLogger("Basic Logger")
logger.info("This is a log message")
logger.critical("This a second log message")
We import the logging module, get an instance of our logger, and generate a log message. If we were to run this logger, we'd see only the second message in our terminal because the default log level is WARNING. It’s also worth noting that logs are not persisted anywhere by default.
With a bit of configuration, we can ensure that all logs are shown in the terminal and that more severe log messages are persisted in a file, so we check them later.
import logging
logger = logging.getLogger("Basic Logger")
logger.setLevel(logging.INFO)
stream_handler = logging.StreamHandler()
stream_handler.setLevel(logging.INFO)
logger.addHandler(stream_handler)
file_handler = logging.FileHandler("error.log")
file_handler.setLevel(logging.ERROR)
logger.addHandler(file_handler)
logger.info("This is a log message")
logger.critical("This a second log message")
In this example, we’ve created two handlers and attached both to our logger. The StreamHandler()
will output all messages above INFO level to our terminal, while the FileHandler()
will write all ERROR level and above messages to the error.log
file.
The example messages above are very simplistic and won't provide enough data for a real-world application. Let’s imagine that we want to log data about air traffic. These logs could include information about the airline, where a plane is departing from, where it's going to, whether it's delayed, and so on.
It may be tempting to include this information in our logs using string formatting.
if status == "departed":
logger.info(f"{flight['airline']} flight {flight['id']} from {flight['origin']['airport']}, {flight['origin']['city']}, {flight['origin']['country']} ({flight['origin']['iata']}) to {flight['destination']['airport']}, {flight['destination']['city']}, {flight['destination']['country']} ({flight['destination']['iata']}) has departed at {flight['flight_time']}")
if status == 'landed':
logger.info(f"{flight['airline']} flight {flight['id']} from {flight['origin']['airport']}, {flight['origin']['city']}, {flight['origin']['country']} ({flight['origin']['iata']}) to {flight['destination']['airport']}, {flight['destination']['city']}, {flight['destination']['country']} ({flight['destination']['iata']}) has landed at {flight['flight_time']}")
if status == 'delayed':
logger.warning(f"{flight['airline']} flight {flight['id']} from {flight['origin']['airport']}, {flight['origin']['city']}, {flight['origin']['country']} ({flight['origin']['iata']}) to {flight['destination']['airport']}, {flight['destination']['city']}, {flight['destination']['country']} ({flight['destination']['iata']}) is delayed by approximately {flight['delay_duration']}")
if status == 'cancelled':
logger.critical(f"{flight['airline']} flight {flight['id']} from {flight['origin']['airport']}, {flight['origin']['city']}, {flight['origin']['country']} ({flight['origin']['iata']}) to {flight['destination']['airport']}, {flight['destination']['city']}, {flight['destination']['country']} ({flight['destination']['iata']}) has been cancelled due to {flight['cancellation_reason']}")
This will produce log messages with a high level of detail that are human-readable:
> WARNING:__main__:Air New Zealand flight 533 from El Alto International airport, La Paz, Bolivia (LPB) to Wenzhou airport, Wenzhou, China (WNZ) is delayed by approximately 5 hours, 22 minutes
> INFO:__main__:Tianjin Airlines flight 839 from Pinto Martins International airport, Fortaleza, Brazil (FOR) to Puerto del Rosario airport, Antigua, Spain (FUE) has departed at 12:29:17
At first glance, these messages seem to contain everything we require. But, they have a critical flaw. They assume that the primary consumer of the logs will be a person, which often isn't the case.
I believe the widespread use of format strings in logging is
based on two presumptions: The first level consumer of a log message is a human. The programmer knows what information is needed to debug an issue. I believe these presumptions are no longer correct in server-side software.Paul Querna
The days of webmasters tailing Apache logs to spot problems with their websites are mostly behind us. It isn't feasible for a person to ingest the sheer volume of log data produced by a modern application.
If we were to tail this log file, we wouldn't be able to tell how many Air Canada flights have been cancelled due to mechanical error, how many flights from PDX are delayed by over an hour, or how many flights have landed in AMS from America.
The information is all there, but it isn't easily accessible.
That doesn't mean that logs are no longer useful. They can still provide us with vital insights into our applications; we just need to structure them so that machines can parse the data instead of humans.
Structured logging means that you don’t write hard-to-parse and hard-to-keep-consistent prose in your logs but that you log events that happen in a context instead.
Instead of writing our log messages as human-readable strings—which are notoriously difficult for machines to parse—we store the data in dictionaries that can then be serialized.
We could write a custom formatter for the standard library logging module, but instead, we’re going to use the wonderful structlog package.
from structlog import get_logger
log = get_logger("Structured Logger")
if status in ['departed', 'landed']:
log.info(status, **flight)
elif status == 'delayed':
log.warning(status, **flight)
else:
log.critical(status, **flight)
Like the Python logging module, you can start logging with structlog in just a few lines. But now, we no longer have long formatted strings; instead, we simply provide structlog with the relevant information as a dictionary.
> 2021-08-27 12:45.54 [warning ] delayed airline=Tropic Air delay_duration=35634 destination={'airport': 'San Pedro airport', 'iata': 'SPR', 'icao': '', 'city': 'San Pedro', 'state': 'Belize', 'country': 'Belize'} flight_id=597 origin={'airport': 'Mingaladon airport', 'iata': 'RGN', 'icao': 'VYYY', 'city': 'Insein', 'state': 'Yangon', 'country': 'Myanmar'} price=563 stops=3
Our log messages are now structured in terms of key/value pairs. This isn't as human-readable as the string format example, but that’s okay. People will no longer be the primary consumers of our logs.
Another powerful feature of structlog is the processor pipeline. The processor pipeline is a list of callables (or processors) where the return value of each callable is passed on to the next. The return value of the last callable gets passed to the logging method.
Structlog comes with several built-in processors, including a JSON renderer. Here’s an example of JSON data after it's rendered:
> {"flight_id": 715, "airline": "Skymark Airlines", "origin": {"airport": "Cheju International airport", "iata": "CJU", "icao": "RKPC", "city": "Jeju-Si", "state": "Jaeju-Do", "country": "South Korea"}, "destination": {"airport": "Viracopos International airport", "iata": "VCP", "icao": "SBKP", "city": "Campinas", "state": "Sao Paulo", "country": "Brazil"}, "stops": 3, "price": 474, "cancellation_reason": "mechanical fault", "event": "cancelled", "logger": "Structured Logger", "level": "critical", "timestamp": "2021-08-27T17:03:05.786801Z"}
Now that we have our logs as a JSON string, we can send them to other services to be ingested.
For this example, we’re going to POST our logs to New Relic using the Log API. Check the New Relic documentation for more information on the different ways you can forward logs to New Relic.
def send_to_newrelic(logger, log_method, event_dict):
headers = {
'Api-Key': os.environ['API_KEY']
}
payload = {
"message": f"{logger} - {log_method} - {event_dict['event']}",
"attributes": event_dict
}
httpx.post('https://log-api.newrelic.com/log/v1', json=payload, headers=headers)
return event_dict
We can now log into New Relic and explore our log messages in the Logs UI.
If we click on a log we can see all of the additional structured information for that entry. New Relic will automatically parse the JSON from our API request into attributes.
import os
import httpx
import structlog
from structlog import get_logger
log = get_logger("Structured Logger")
# Custom processor
# Uses the New Relic Log API
# https://docs.newrelic.com/docs/logs/log-management/log-api/introduction-log-api/
def send_to_newrelic(logger, log_method, event_dict):
# Your New Relic API Key
# https://docs.newrelic.com/docs/apis/intro-apis/new-relic-api-keys/
headers = {"Api-Key": os.environ["API_KEY"]}
# Our log message and all the event context is sent as a JSON string
# in the POST body
# https://docs.newrelic.com/docs/logs/log-management/log-api/introduction-log-api/#json-content
payload = {
"message": f"{log_method} - {event_dict['event']}",
"attributes": event_dict,
}
httpx.post("https://log-api.newrelic.com/log/v1", json=payload, headers=headers)
return event_dict
# Configure Structlog's processor pipeline
structlog.configure(
processors=[send_to_newrelic, structlog.processors.JSONRenderer()],
)
# Your application code that
# is generating the data to be logged
flight = get_flight_details()
status = get_flight_status()
# Generate our log events
if status in ["departed", "landed"]:
log.info(status, **flight)
elif status == "delayed":
log.warning(status, **flight)
else:
log.critical(status, **flight)
We can now query our New Relic logs for useful data using NRQL (New Relic Query Language) and answer the questions we posed earlier.
`SELECT count(*) FROM Log WHERE `airline` = 'Air Canada'
AND `event` = 'cancelled' AND `cancellation_reason` LIKE
'%mechanical%'`
`SELECT count(*) FROM Log WHERE `origin.iata` = 'PDX' AND
`event` = 'delayed' AND `delay_duration` > 60*60`
`SELECT count(*) FROM Log WHERE `destination.iata` = 'AMS'
AND `origin.country` = 'United States' AND `event` =
'landed'`
With structured logs, we can access the information we need now, but we also have the historical data available to answer questions in the future, too, including questions that we couldn't predict during our initial development.
Once you are happy with the basics of structured logging, you can explore some of the more complex configurations and features.
- Learn how you can use structlog as a drop-in replacement for the Python logging module.
- Try integrating structlog with common Python frameworks, like Flask and Twisted.
- Install the New Relic infrastructure agent, and configure it to forward your custom log files.
- Explore the Logs UI, identify patterns, create dashboards, and even trigger alerts.
You can also watch our live streams about Structured Logging (amongst many other things) on my Twitch channel @aaronbassettdev, or for the inside scoop on what's new with New Relic catch our streams on @new_relic
There is a quick recap of my latest Structured Logging stream on Youtube
19