22
Writing data to disk: transforming brittle code to robust code with atomic writes
This is the first post in a series where I'll cover about writing robust code that's can tolerate both expected and unexpected failures
Receiving feedback through code reviews is one of the many ways to grow your career as a software developer. But of course, not all feedback hold the same value. Not so useful comments tend to focus on nit picking (e.g. white space); moderately useful comments detect logic or semantic bugs; fairly useful ones help you see problems through a different lens; the best comments open your eyes to issues that you didn't even know existed.
One of the most eye-opening code reviews I submitted during my tenure at Amazon Web Service (AWS) revealed to me the importance of atomic writes to disk.
Let's take a look at the snippet of Python code below that writes data to disk.
dataset = fetch_data()
...
with open('customers.txt') as fh:
for each customer in dataset:
fh.write(...)
At a glance, the above code looks and smells π okay. It's coded with idiomatic Python: the context manager (i.e. with open
) cleans up lingering resources for you, automatically closing out the file handle. Awesome. I see code like this all the time. But, can you spot the issue?
The lack of atomicity?
In general, an atomic operation is all or nothing, binary, 0 or 1; the operation has either 1) not yet started or 2) has completed successfully. No gray areas. In the context of writing data to disk, the destination must contain all the data we expect to be present in the file, non-corrupted. Not some of the data β all of it.
So how do transform the above code such that we atomically write to disk?
As is stands, the above code is brittle, susceptible to failures. What happens if the program raises an exception mid-write? Or if the server powers off in between one of the read or write operations, leaving the data corrupted? In other words, the code opens us up to leaving the file in an unknown state.
Here's how we go about writing an atomic file.
- Create a temporary file
- Write contents to temporary file
- Flush buffers
- Sync to disk
- Rename file.
from x import TempFile
# 1. create temporary file
with open(tempfile) as fh:
# 2. Write contents to file handle
fh.write(...)
# 3. Flush from any runtime or OS buffers
fh.flush()
# 4. Sync from memory to disk
os.fsync(fh.fileno())
# 5. Rename and replace destination file
os.rename(tempFile, "customers.txt")
We start the procedure with opening a temporary file; this temporary file becomes the intermediate destination in which we direct our writes. By writing to a temporary file, we leave the ultimate destination file (if it exists) in tact, only replacing the destination file if all the data has been successfully written to the temporary file. Once all writes finished, then we simply rename the temporary file to that of the destination file, an atomic operation in itself.
Above, I demonstrated one way to apply atomicity. This principle can be applied to many other situations. For example, if you are writing multi-threaded code and accessing shared memory, a thread needs to atomically obtain a lock before modifying the underlying shared data structures.
So, moving forward, when writing or reviewing code, keep the possibility of failures at the fore front of your mind and identify ways you can apply the principle of atomicity to turn fragile code into robust software.
Let's connect and talk more about software and devops. Follow me on Twitter: @memattchung
22