Sign in to your Python Morsels account to save your screencast settings.
Don't have an account yet? Sign up here.
How can you read a CSV file in Python?
csv.reader
The Python Standard Library has a csv
module, which has a reader
function within it:
>>> import csv
>>> csv.reader
<built-in function reader>
We can use the reader
function by passing it an iterable of lines.
This usually involves passing reader
a file object, because files are iterables in Python, and as we loop over them, we'll get back each line in our file (see reading a file line-by-line in Python).
Here we have a CSV file called penguins_small.csv
:
species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
Adelie,Torgersen,39.1,18.7,181,3750,MALE
Adelie,Dream,39.5,16.7,178,3250,FEMALE
Adelie,Biscoe,39.6,17.7,186,3500,FEMALE
Chinstrap,Dream,46.5,17.9,192,3500,FEMALE
Gentoo,Biscoe,46.1,13.2,211,4500,FEMALE
Let's use Python's built-in open
function to [open our file for reading][reading from a file].
>>> penguins_file = open("penguins_small.csv")
Now we can call pass the file object we got back to csv.reader
:
>>> penguins_file = open("penguins_small.csv")
>>> reader = csv.reader(penguins_file)
When we call csv.reader
we'll get back a reader
object:
>>> reader
<_csv.reader object at 0x7fd34c861930>
We can loop over that reader
object to get the rows within it:
>>> for row in reader:
... print(row)
...
When we loop over a csv.reader
object, the reader
will loop over the file object that we originally gave it and convert each line in our file to a list of strings:
>>> for row in reader:
... print(row)
...
['species', 'island', 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex']
['Adelie', 'Torgersen', '39.1', '18.7', '181', '3750', 'MALE']
['Adelie', 'Dream', '39.5', '16.7', '178', '3250', 'FEMALE']
['Adelie', 'Biscoe', '39.6', '17.7', '186', '3500', 'FEMALE']
['Chinstrap', 'Dream', '46.5', '17.9', '192', '3500', 'FEMALE']
['Gentoo', 'Biscoe', '46.1', '13.2', '211', '4500', 'FEMALE']
Each list represents one row in our file, and each string in each list represents the data in one column in that row.
Note that csv.reader
doesn't know or care about the headers in our file: it treats every row equally.
We can try to work around this lack of headers by recognizing that reader
objects are both iterables and iterators.
So just like file objects, we can pass reader
objects to the built-in next
function to get just their next row.
So if we wanted to skip over the first row in our file (skipping over the header line) we could pass reader
to the next
function to consume the first row, and then we could continue looping over it after that:
>>> import csv
>>> penguins_file = open("penguins_small.csv")
>>> reader = csv.reader(penguins_file)
>>> next(reader)
['species', 'island', 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex']
>>> for row in reader:
... print(row)
...
['Adelie', 'Torgersen', '39.1', '18.7', '181', '3750', 'MALE']
['Adelie', 'Dream', '39.5', '16.7', '178', '3250', 'FEMALE']
['Adelie', 'Biscoe', '39.6', '17.7', '186', '3500', 'FEMALE']
['Chinstrap', 'Dream', '46.5', '17.9', '192', '3500', 'FEMALE']
['Gentoo', 'Biscoe', '46.1', '13.2', '211', '4500', 'FEMALE']
This works because iterators are consumed as we loop over them.
That first row was consumed by our next
call and then we kept looping to get the rest of our rows.
If you'd prefer to think in terms of the headers in your file, rather than in terms of the indexes of each data column, you could use csv.DictReader
instead of csv.reader
.
>>> import csv
>>> penguins_file = open("penguins_small.csv")
>>> reader = csv.DictReader(penguins_file)
Just like with reader
objects, we can loop over DictReader
objects to get the rows within our file.
But unlike reader
objects, instead of getting back lists representing each row, we'll get back dictionaries:
>>> for row in reader:
... print(row)
...
{'species': 'Adelie', 'island': 'Torgersen', 'bill_length_mm': '39.1', 'bill_depth_mm': '18.7', 'flipper_length_mm': '181', 'body_mass_g': '3750', 'sex': 'MALE'}
{'species': 'Adelie', 'island': 'Dream', 'bill_length_mm': '39.5', 'bill_depth_mm': '16.7', 'flipper_length_mm': '178', 'body_mass_g': '3250', 'sex': 'FEMALE'}
{'species': 'Adelie', 'island': 'Biscoe', 'bill_length_mm': '39.6', 'bill_depth_mm': '17.7', 'flipper_length_mm': '186', 'body_mass_g': '3500', 'sex': 'FEMALE'}
{'species': 'Chinstrap', 'island': 'Dream', 'bill_length_mm': '46.5', 'bill_depth_mm': '17.9', 'flipper_length_mm': '192', 'body_mass_g': '3500', 'sex': 'FEMALE'}
{'species': 'Gentoo', 'island': 'Biscoe', 'bill_length_mm': '46.1', 'bill_depth_mm': '13.2', 'flipper_length_mm': '211', 'body_mass_g': '4500', 'sex': 'FEMALE'}
Our DictReader
object parses the first row in our file and treats that row as our header row.
It uses those headers as keys in each of the dictionaries that it gives us as we loop over it.
So every dictionary that it gives us will have the same keys representing those headers:
>>> row.keys()
dict_keys(['species', 'island', 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex'])
But the values in each dictionary will be different: the values for each dictionary row represent the data that corresponds to each header:
>>> row.values()
dict_values(['Gentoo', 'Biscoe', '46.1', '13.2', '211', '4500', 'FEMALE'])
We can use Python's csv
module to parse comma-delimited data, but we can also use it to parse other types of delimited data.
Both csv.reader
and csv.DictReader
accept an optional delimiter
argument, which defaults to a comma (,
).
Here we have a tab-delimited file called plu_codes.txt
:
plu category commodity variety size measurements_north_america measurements_rest_of_world restrictions_notes botanical_name aka notes revision_date date_added gpc image_url image
3000 Fruits Apples Alkmene All Sizes NA NA NA Malus domestica NA test note 2021-03-04 1999-12-31 NA NA NA
3001 Fruits Apples Aurora/Southern Rose Small 100 size and smaller Average Fruit Weight = less than 205g NA Malus domestica NA NA 2021-02-26 1999-12-31 NA NA NA
4960 Fruits Pears Fragrant All Sizes NA NA NA Pyrus spp. NA NA 2007-05-04 2007-05-04 NA http://check.ifpsglobal.com/file/view/fragrant-pear-2006-151_1629143631.JPG fragrant-pear-2006-151_1629143631.JPG
We can use csv.reader
to read this file by specifying a delimiter of a tab character (\t
):
>>> import csv
>>> plu_tsv_file = open("plu_codes.txt")
>>> reader = csv.reader(plu_tsv_file, delimiter="\t")
As we loop over rows in our reader
object, the reader
will split up each of line by tab characters instead of by commas:
>>> next(reader)
['plu', 'category', 'commodity', 'variety', 'size', 'measurements_north_america', 'measurements_rest_of_world', 'restrictions_notes', 'botanical_name', 'aka', 'notes', 'revision_date', 'date_added', 'gpc', 'image_url', 'image']
>>> next(reader)
['3000', 'Fruits', 'Apples', 'Alkmene', 'All Sizes', 'NA', 'NA', 'NA', 'Malus domestica', 'NA', 'test note', '2021-03-04', '1999-12-31', 'NA', 'NA', 'NA']
csv.reader
or csv.DictReader
to read CSV files in PythonPython's csv
module includes helpers for reading CSV files.
You can use csv.reader
to get back lists representing each row in your file.
Or if you prefer to rely on the headers in your file, you can use csv.DictReader
to get dictionaries representing each of those rows.
Intro to Python courses often skip over some fundamental Python concepts.
Sign up below and I'll explain concepts that new Python programmers often overlook.
Intro to Python courses often skip over some fundamental Python concepts.
Sign up below and I'll share ideas new Pythonistas often overlook.