21
How to Parse CSV a Text Variable
Most of us know how to parse a CSV file, but what about other sources such as a text block. For example, given the following block of text:
text = """
uid,alias,shell
501,karen,bash
502,john,tcsh
"""
How do we parse this text. Should we write it to a file first, then read?
According to the documentation for the csv library, a CSV reader can handle a file, or anything that supports the iterator protocol, which includes list, or in-memory file objects. Let us explore the first solution, which splits the text into lines.
import csv
text = """
uid,alias,shell
501,karen,bash
502,john,tcsh
"""
reader = csv.reader(text.strip().splitlines())
for row in reader:
print(row)
['uid', 'alias', 'shell']
['501', 'karen', 'bash']
['502', 'john', 'tcsh']
That was easy. Note that we called .strip()
to remove whitespaces surrounding the text block before splitting it into individual lines. Next, we feed the lines to csv.reader
and let it does its work.
Alternatively, we can use io.StringIO
to turn the text into an in-memory file:
import csv
import io
text = """
uid,alias,shell
501,karen,bash
502,john,tcsh
"""
in_memory_file = io.StringIO(text.strip())
reader = csv.reader(in_memory_file)
for row in reader:
print(row)
['uid', 'alias', 'shell']
['501', 'karen', 'bash']
['502', 'john', 'tcsh']
Of course, these techniques also work with other kind of reader: csv.DictReader
:
import csv
import io
text = """
uid,alias,shell
501,karen,bash
502,john,tcsh
"""
in_memory_file = io.StringIO(text.strip())
reader = csv.DictReader(in_memory_file)
for row in reader:
print(row)
OrderedDict([('uid', '501'), ('alias', 'karen'), ('shell', 'bash')])
OrderedDict([('uid', '502'), ('alias', 'john'), ('shell', 'tcsh')])
Parsing a block of CSV text is not that hard, you do not need to write it to an external file and that is the beauty of the csv
library: it can work with a number of input sources, not just file.
21