Python: import csv

Q: What is CSV?

A: Comma separated values or CSV is file format defined for exchanging information where comma(,) acts as delimiter and each row/record is separated by line break.

For example, consider this capital.csv file.

country,capital
India,New Delhi
Ireland,Dublin
Germany,Berlin
Austria,Vienna

Adding a header is optional, but it very common to find one in a csv file as it provides context on the data.

Q: What are the benefits that I receive from using this format?

A: Firstly, it is quite easy to understand for a human. If you have used spreadsheets sometime in your life, you have interacted with some form of csv. Secondly, it is useful for storing structured data, where in each column is some data point, similar to a relational database.

Q: Okay, now tell me how can I read a csv file in Python?

A: You can read a csv file by using the standard csv module, as follows:

import csv

with open('capital.csv') as f:
    reader = csv.reader(f)

    header = next(reader)

    for row in reader:
        print(row)

Output:

['India', 'New Delhi']
['Ireland', 'Dublin']
['Germany', 'Berlin']
['Austria', 'Vienna']

OR

import csv


with open('capital.csv') as f:
    reader = csv.DictReader(f)

    for row in reader:
        print('Country:', row['country'])

Output:

Country: India
Country: Ireland
Country: Germany
Country: Austria

By using the DictReader we get the output in dictionary format, using which we can access the members as row['country'], or row['capital']. Field names (header) can be retrieved as reader.fieldnames.

Q: Is it mandatory for the delimiter to be a comma or can we use a delimiter of our own choice?

A: Good question! There is no specific guideline for the same. Python allows to customize the delimiter and also provides several other options. A collection of such options is called a dialect.

You can look up the different dialects:

import csv

print(csv.list_dialects())

Output:

['excel', 'excel-tab', 'unix']

The different options are:

  1. delimiter
  2. doublequote
  3. escapechar
  4. lineterminator
  5. quotechar
  6. quoting
  7. skipinitialspace
  8. strict

Most of them are self explanatory. You can read more about them here.

You can either create a separate dialect and pass it to the reader function, or if you require only few options, and creating a separate dialect sounds overkill for you then you can pass these options directly to the reader function.

For example, consider the sample capital.csv file with some minor changes.

country,  capital
India,    New Delhi
Ireland,  Dublin
Germany,  Berlin
Austria,  Vienna

The file is maintained with some spaces to be able to read it properly, but but but, we don't require these spaces while parsing it. Guess which option can we use to solve this? Yes, you have guessed it right! skipinitialspace it is.

import csv

with open('captial.csv') as f:
    reader = csv.reader(f, skipinitialspace=True)
    header = next(reader)

    for row in reader:
        print(row)

Output:

['India', 'New Delhi']
['Ireland', 'Dublin']
['Germany', 'Berlin']
['Austria', 'Vienna']

Try playing with the rest of the options.

And if you want to create a separate dialect, you can use csv.Dialect for it, and pass it to the reader function as csv.reader(f, dialect=dialect).

...

my_dialect = csv.excel()
my_dialect.skipinitialspace = True


with open('capital.csv') as f:
    reader = csv.reader(f, dialect=my_dialect)

...

Here I am using the default csv.excel dialect, with a tweak in setting skipinitialspace option as true.

There is yet another option,

...

my_dialect = csv.excel()
my_dialect.skipinitialspace = True

csv.register_dialect('my_dialect', my_dialect)


with open('capital.csv') as f:
    reader = csv.reader(f, dialect='my_dialect')

...

This way you can register a dialect and then use it throughout the program.

Q: That was quite insightful! Before I leave, can you elaborate on how can I create a csv file if I have some data? I am guessing we have a writer function complementing the reader?

A: You are absolutely right! Similar to reader and DictReader, we have writer and DictWriter.

import csv

header = ["country", "capital"]

data = [("India", "New Delhi"), ("Ireland", "Dublin"), ("Germany", "Berlin"), ("Austria", "Vienna")]

with open('new_captial.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerow(header)

    for row in data:
        writer.writerow(row)

Rather than writing our own "for loop" for writing row, we can also use the writerows method of the writer.

...

with open('new_captial.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerow(header)

    writer.writerows(data)

csv.DictWriter is useful when working with json data in a particular format.

import csv

header = ["country", "capital"]

data = [
    {"country": "India", "capital": "New Delhi"},
    {"country": "Ireland", "capital": "Dublin"},
    {"country": "Germany", "capital": "Berlin"},
    {"country": "Austria", "capital": "Vienna"},
]

with open('new_captial.csv', 'w') as f:
    writer = csv.DictWriter(f, fieldnames=header)
    writer.writeheader()

    writer.writerows(data)

You can also pass the dialect to writer function to change the output format.

26