How to read from a text file

Transcript

Let's talk about reading from a text file in Python.

Opening and reading a text file in Python

Python has a built-in open function that accepts a filename (in this case we're using this diary980.md file), and it gives us back a file object:

>>> f = open("diary980.md")
>>> f
<_io.TextIOWrapper name='diary980.md' mode='r' encoding='UTF-8'>

Technically, we get back an _io.TextIOWrapper object, but we don't talk about it that way; we refer to this thing as a file object.

File objects have a read method, which gives back a string representing the entire contents of that file:

>>> f.read()
'Python Log -- Day 980\n\nToday I learned about metaclasses.\nMetaclasses are a class\'s class.\nMeaning every class is an instance of a metaclass.\nThe default metaclass is "type".\n\nClasses control features (like string representations) of all their instances.\nMetaclasses can control similar features for their classes.\n\nI doubt I\'ll ever need to make a metaclass, at least not for production code.\n'

That string represents the contents of this diary980.md file:

Python Log -- Day 980

Today I learned about metaclasses.
Metaclasses are a class's class.
Meaning every class is an instance of a metaclass.
The default metaclass is "type".

Classes control features (like string representations) of all their instances.
Metaclasses can control similar features for their classes.

I doubt I'll ever need to make a metaclass, at least not for production code.

Once we're done working with a file, we should make sure it's closed, in order to let the operating system know that we're done with it:

>>> f.close()

After we've closed a file, we can't do anything else with that file object:

>>> f.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: I/O operation on closed file.

Closing files manually using the close method

Now we can rely on Python closing files automatically when our Python process exits, but we usually shouldn't rely on that (unless we want our file open for the entirety of our Python process).

We could close files manually using the close method on file objects, but we probably shouldn't do that either.

Here we're opening a file, processing it, and then closing it:

In [1]: diary_file = open("diary980.md")
   ...: day = int(diary_file.read().splitlines()[0].split('--')[1])
   ...: diary_file.close()
   ...: print(day)

If an exception occurs on the line between the open call and the close method call, our close method will never be called:

ValueError                                Traceback (most recent call last)
<ipython-input-1-88e2802a7886> in <module>
      1 diary_file = open("diary980.md")
----> 2 day = int(diary_file.read().splitlines()[0].split('--')[1])
      3 diary_file.close()
      4 print(day)

ValueError: invalid literal for int() with base 10: ' Day 980'

Our file object is still open at this point:

In [2]: diary_file.closed
Out[2]: False

The line of code before our close method call raised an exception so our file wasn't properly closed.

Using a with block to automatically close the file

We can fix this problem by using a with block. Using a with block will automatically close our file for us:

with open("diary980.md") as diary_file:
    day = int(diary_file.read().splitlines()[0].split('--')[1])
print(day)

A with block works with a context manager, and files are context managers.

When a with block is entered, Python will inform the context manager object so it can execute some special entrance code. When the with block is exited (when the indentation level ends the code block) Python will inform the context manager object so it can execute some special exit code. See the context managers page for more on context managers.

File objects choose to make their exit code close themselves automatically. So you can use a with block to make sure that your file always closes when the with block exits.

If we execute the code in our with block:

In [1]: with open("diary980.md") as diary_file:
   ...:     day = int(diary_file.read().splitlines()[0].split('--')[1])
   ...: print(day)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-5d7936c2bce8> in <module>
      1 with open("diary980.md") as diary_file:
----> 2     day = int(diary_file.read().splitlines()[0].split('--')[1])
      3 print(day)

ValueError: invalid literal for int() with base 10: ' Day 980'

Even though an exception occurred our file will end up closed because our with block was exited:

In [2]: diary_file.closed
Out[2]: True

Summary

To work with a text file in Python, you can use the built-in open function, which gives you back a file object. File objects have a read method, which will give you back the entire contents of that file as a string.

You should always make sure that your files are closed when you're done working with them.

For a very simple command-line program, you could just rely on Python to close the file automatically when your Python program exits. But it's considered a best practice to use a with block to make sure that your file is closed automatically as soon as you're done with it.