How to Parse CSV a Text Variable

The Problem

Most of us know how to parse a CSV file, but what about other sources such as a text block. For example, given the following block of text:

text = """
uid,alias,shell
501,karen,bash
502,john,tcsh
"""

How do we parse this text. Should we write it to a file first, then read?

The Solution

According to the documentation for the csv library, a CSV reader can handle a file, or anything that supports the iterator protocol, which includes list, or in-memory file objects. Let us explore the first solution, which splits the text into lines.

import csv

text = """
uid,alias,shell
501,karen,bash
502,john,tcsh
"""
reader = csv.reader(text.strip().splitlines())
for row in reader:
    print(row)
['uid', 'alias', 'shell']
['501', 'karen', 'bash']
['502', 'john', 'tcsh']

That was easy. Note that we called .strip() to remove whitespaces surrounding the text block before splitting it into individual lines. Next, we feed the lines to csv.reader and let it does its work.

Alternatively, we can use io.StringIO to turn the text into an in-memory file:

import csv
import io

text = """
uid,alias,shell
501,karen,bash
502,john,tcsh
"""
in_memory_file = io.StringIO(text.strip())
reader = csv.reader(in_memory_file)
for row in reader:
    print(row)
['uid', 'alias', 'shell']
['501', 'karen', 'bash']
['502', 'john', 'tcsh']

Of course, these techniques also work with other kind of reader: csv.DictReader:

import csv
import io

text = """
uid,alias,shell
501,karen,bash
502,john,tcsh
"""
in_memory_file = io.StringIO(text.strip())
reader = csv.DictReader(in_memory_file)
for row in reader:
    print(row)
OrderedDict([('uid', '501'), ('alias', 'karen'), ('shell', 'bash')])
OrderedDict([('uid', '502'), ('alias', 'john'), ('shell', 'tcsh')])




Conclusion

Parsing a block of CSV text is not that hard, you do not need to write it to an external file and that is the beauty of the csv library: it can work with a number of input sources, not just file.

20