Reading a CSV file in Python

Share
Copied to clipboard.
Series: Files
Trey Hunner smiling in a t-shirt against a yellow wall
Trey Hunner
4 min. read 3 min. video Python 3.8—3.12

How can you read a CSV file in Python?

Reading a CSV file with csv.reader

The Python Standard Library has a csv module, which has a reader function within it:

>>> import csv
>>> csv.reader
<built-in function reader>

We can use the reader function by passing it an iterable of lines. This usually involves passing reader a file object, because files are iterables in Python, and as we loop over them, we'll get back each line in our file (see reading a file line-by-line in Python).

Here we have a CSV file called penguins_small.csv:

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
Adelie,Torgersen,39.1,18.7,181,3750,MALE
Adelie,Dream,39.5,16.7,178,3250,FEMALE
Adelie,Biscoe,39.6,17.7,186,3500,FEMALE
Chinstrap,Dream,46.5,17.9,192,3500,FEMALE
Gentoo,Biscoe,46.1,13.2,211,4500,FEMALE

Let's use Python's built-in open function to open our file for reading.

>>> penguins_file = open("penguins_small.csv")

Now we can call pass the file object we got back to csv.reader:

>>> penguins_file = open("penguins_small.csv")
>>> reader = csv.reader(penguins_file)

When we call csv.reader we'll get back a reader object:

>>> reader
<_csv.reader object at 0x7fd34c861930>

We can loop over that reader object to get the rows within it:

>>> for row in reader:
...     print(row)
...

When we loop over a csv.reader object, the reader will loop over the file object that we originally gave it and convert each line in our file to a list of strings:

>>> for row in reader:
...     print(row)
...
['species', 'island', 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex']
['Adelie', 'Torgersen', '39.1', '18.7', '181', '3750', 'MALE']
['Adelie', 'Dream', '39.5', '16.7', '178', '3250', 'FEMALE']
['Adelie', 'Biscoe', '39.6', '17.7', '186', '3500', 'FEMALE']
['Chinstrap', 'Dream', '46.5', '17.9', '192', '3500', 'FEMALE']
['Gentoo', 'Biscoe', '46.1', '13.2', '211', '4500', 'FEMALE']

Each list represents one row in our file, and each string in each list represents the data in one column in that row.

Skipping the header row in a CSV file

Note that csv.reader doesn't know or care about the headers in our file: it treats every row equally.

We can try to work around this lack of headers by recognizing that reader objects are both iterables and iterators. So just like file objects, we can pass reader objects to Python's next function to get just their next row.

So if we wanted to skip over the first row in our file (skipping over the header line) we could pass reader to the next function to consume the first row, and then we could continue looping over it after that:

>>> import csv
>>> penguins_file = open("penguins_small.csv")
>>> reader = csv.reader(penguins_file)
>>> next(reader)
['species', 'island', 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex']
>>> for row in reader:
...     print(row)
...
['Adelie', 'Torgersen', '39.1', '18.7', '181', '3750', 'MALE']
['Adelie', 'Dream', '39.5', '16.7', '178', '3250', 'FEMALE']
['Adelie', 'Biscoe', '39.6', '17.7', '186', '3500', 'FEMALE']
['Chinstrap', 'Dream', '46.5', '17.9', '192', '3500', 'FEMALE']
['Gentoo', 'Biscoe', '46.1', '13.2', '211', '4500', 'FEMALE']

This works because iterators are consumed as we loop over them. That first row was consumed by our next call and then we kept looping to get the rest of our rows.

Mapping CSV headers to columns

If you'd prefer to think in terms of the headers in your file, rather than in terms of the indexes of each data column, you could use csv.DictReader instead of csv.reader.

>>> import csv
>>> penguins_file = open("penguins_small.csv")
>>> reader = csv.DictReader(penguins_file)

Just like with reader objects, we can loop over DictReader objects to get the rows within our file. But unlike reader objects, instead of getting back lists representing each row, we'll get back dictionaries:

>>> for row in reader:
...     print(row)
...
{'species': 'Adelie', 'island': 'Torgersen', 'bill_length_mm': '39.1', 'bill_depth_mm': '18.7', 'flipper_length_mm': '181', 'body_mass_g': '3750', 'sex': 'MALE'}
{'species': 'Adelie', 'island': 'Dream', 'bill_length_mm': '39.5', 'bill_depth_mm': '16.7', 'flipper_length_mm': '178', 'body_mass_g': '3250', 'sex': 'FEMALE'}
{'species': 'Adelie', 'island': 'Biscoe', 'bill_length_mm': '39.6', 'bill_depth_mm': '17.7', 'flipper_length_mm': '186', 'body_mass_g': '3500', 'sex': 'FEMALE'}
{'species': 'Chinstrap', 'island': 'Dream', 'bill_length_mm': '46.5', 'bill_depth_mm': '17.9', 'flipper_length_mm': '192', 'body_mass_g': '3500', 'sex': 'FEMALE'}
{'species': 'Gentoo', 'island': 'Biscoe', 'bill_length_mm': '46.1', 'bill_depth_mm': '13.2', 'flipper_length_mm': '211', 'body_mass_g': '4500', 'sex': 'FEMALE'}

Our DictReader object parses the first row in our file and treats that row as our header row. It uses those headers as keys in each of the dictionaries that it gives us as we loop over it.

So every dictionary that it gives us will have the same keys representing those headers:

>>> row.keys()
dict_keys(['species', 'island', 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex'])

But the values in each dictionary will be different: the values for each dictionary row represent the data that corresponds to each header:

>>> row.values()
dict_values(['Gentoo', 'Biscoe', '46.1', '13.2', '211', '4500', 'FEMALE'])

Reading tab-delimited data files

We can use Python's csv module to parse comma-delimited data, but we can also use it to parse other types of delimited data.

Both csv.reader and csv.DictReader accept an optional delimiter argument, which defaults to a comma (,).

Here we have a tab-delimited file called plu_codes.txt:

plu category    commodity   variety size    measurements_north_america  measurements_rest_of_world  restrictions_notes  botanical_name  aka notes   revision_date   date_added  gpc image_url   image
3000    Fruits  Apples  Alkmene All Sizes   NA  NA  NA  Malus domestica NA  test note   2021-03-04  1999-12-31  NA  NA  NA
3001    Fruits  Apples  Aurora/Southern Rose    Small   100 size and smaller    Average Fruit Weight = less than 205g   NA  Malus domestica NA  NA  2021-02-26  1999-12-31  NA  NA  NA
4960    Fruits  Pears   Fragrant    All Sizes   NA  NA  NA  Pyrus spp.  NA  NA  2007-05-04  2007-05-04  NA  http://check.ifpsglobal.com/file/view/fragrant-pear-2006-151_1629143631.JPG fragrant-pear-2006-151_1629143631.JPG

We can use csv.reader to read this file by specifying a delimiter of a tab character (\t):

>>> import csv
>>> plu_tsv_file = open("plu_codes.txt")
>>> reader = csv.reader(plu_tsv_file, delimiter="\t")

As we loop over rows in our reader object, the reader will split up each of line by tab characters instead of by commas:

>>> next(reader)
['plu', 'category', 'commodity', 'variety', 'size', 'measurements_north_america', 'measurements_rest_of_world', 'restrictions_notes', 'botanical_name', 'aka', 'notes', 'revision_date', 'date_added', 'gpc', 'image_url', 'image']
>>> next(reader)
['3000', 'Fruits', 'Apples', 'Alkmene', 'All Sizes', 'NA', 'NA', 'NA', 'Malus domestica', 'NA', 'test note', '2021-03-04', '1999-12-31', 'NA', 'NA', 'NA']

Use csv.reader or csv.DictReader to read CSV files in Python

Python's csv module includes helpers for reading CSV files.

You can use csv.reader to get back lists representing each row in your file. Or if you prefer to rely on the headers in your file, you can use csv.DictReader to get dictionaries representing each of those rows.

Concepts Beyond Intro to Python

Intro to Python courses often skip over some fundamental Python concepts.

Sign up below and I'll share ideas new Pythonistas often overlook.