20
Comparison of Python TOML parser libraries
The pypyr automation pipeline task-runner open-source project recently added TOML parsing & writing functionality as a core feature. To this end, I researched the available free & open-source Python TOML parser libraries to figure out which option to use.
If you're interested in a walkthrough of the decision making process, this is documented in an Architecture Decision Record (ADR) you can read here: adr003 toml in the pypyr core
Hopefully sharing my notes helps someone else going through this process to save some time...
Since "best" is a loose term best left to click-bait headlines (hoho, see what I did there? 😉), instead of asking "which TOML parser is the best?" the more sensible question to answer is which TOML library more suits the requirements of your project with the least negatives?
-
tomli
- Relatively fast read-only parsing.
- Companion library
tomli-w
for writing.
-
tomlkit
- Round-trip white-space/style preserving.
-
toml
- This was initially vendored in pip itself to deal with
pyproject.toml
. - Even so
pip
has since moved totomli
.
- This was initially vendored in pip itself to deal with
-
pytoml
- Abandoned by the creator for eminently sensible reasons (interesting read too... https://github.com/avakar/pytoml/issues/15) But let's not get into an argument over whether a shiny new fashion in config formats is in fact doing anything better than the previous fashions in config management...
-
qtoml
- Still on TOML v0.5.0.
Note that these are the pure Python parsers - there are also others that are basically interop wrappers for fast C++ or Rust libraries.
If performance is your main concern, then the C++/Rust implementations might serve your needs, assuming you're fine with these not being pure Python packages.
Let's investigate the pure Python libraries in greater depth:
Of the available options, only tomlkit
supports style-preserving roundtrip parsing. Furthermore, tomlkit
was created for the express purpose of handling TOML parsing for the poetry tool. As this is one of the 2 most popular new PEP517 & PEP518 Python build systems there is some comfort to be had in the wide adoption of a very actively used tool that means a greater likelihood of continued maintenance & support, and specifically that pyproject.toml
files should parse without surprises.
TOMLKit only lists itself as 1.0.0rc1
compliant. Looking at the TOML spec release history delta of rc1
vs v1
, it only looks like clarifications & administrative/documentation updates -there doesn't seem to be anything notable missing or functionally different in rc1
as opposed to v1
. It's not impossible that I missed something, though - but given TOMLKit
's wide usage via poetry
, I would expect obvious out-of-date spec handling to have been noticed by someone somewhere, and I see none such in the issues list.
There is a but... TOMLKit
outputs custom types rather than just the standard Python built-ins like dict
. Specifically it represents tables with classes like class Table(Item,MutableMapping, dict)
or class InlineTable(Item, MutableMapping, dict)
.
(See here for TOMLkit API types.)
The constructors for these do NOT allow any of these to instantiate like a standard Mapping
type does - which may or may not fit your needs, depending on what exactly you're doing. It probably doesn't really matter for most purposes.
The toml
library is a largely historical artifact at this point. Not only is it well behind on implementing TOML v1, but also because of a lack of maintenance on extant functionality.
Even pip
itself has moved from vendoring toml to tomli. This exodus from toml
to tomli
includes the very prominent:
(The links are to the issues/PRs discussing the reasons why...)
tomli
, then, seems to be where the Python community in general is coalescing for a "standard" TOML parser. tomli
is read-only. For write functionality there is the companion library tomli-w.
tomli
is explicitly TOML v1.0 compliant.
tomli
is significantly faster than TOMLKit. It does not, however, preserve style/whitespace like tomlkit
does. For most use-cases, arguably TOML reading is the important part...
Use tomlkit
if you need to round-trip & preserve style + comments.
Use tomli
if you're just after reading a config file or write some output without caring about the formatting too much.
Use the Rust/C++ interop libraries if performance is your main concern and you do not have limitations on using packages that aren't pure Python.
In the case of pypyr, tomli
matched the requirements with the least trade-offs. And as a mini-review, it was a joy to use 😄.
If you're interested in seeing a real-world usage example for tomli
& tomli-w
, you can check it out in action here: https://github.com/pypyr/pypyr/blob/main/pypyr/toml.py
20