Let's talk about reading files line-by-line in Python.
Here we're calling the read
method on a file object (for a file called diary980.md
):
>>> filename = "diary980.md"
>>> with open(filename) as diary_file:
... contents = diary_file.read()
...
>>> contents
'Python Log -- Day 980\n\nToday I learned about metaclasses.\nMetaclasses are a class\'s class.\nMeaning every class is an instance of a metaclass.\nThe default metaclass is "type".\n\nClasses control features (like string representations) of all their instances.\nMetaclasses can control similar features for their classes.\n\nI doubt I\'ll ever need to make a metaclass, at least not for production code.\n'
When you call the read
method on a file object, Python will read the entire file into memory all at once.
But that could be a bad idea if you're working with a really big file.
There's another common way to process files in Python: you can loop over a file object to read it line-by-line:
>>> filename = "diary980.md"
>>> with open(filename) as diary_file:
... n = 1
... for line in diary_file:
... print(n, line)
... n += 1
...
Here, we're printing out a number (counting upward) in each line in our file:
1 Python Log -- Day 980
2
3 Today I learned about metaclasses.
4 Metaclasses are a class's class.
5 Meaning every class is an instance of a metaclass.
6 The default metaclass is "type".
7
8 Classes control features (like string representations) of all their instances.
9 Metaclasses can control similar features for their classes.
10
11 I doubt I'll ever need to make a metaclass, at least not for production code.
Notice that as we print, Python isn't just printing out the line, but an extra blank line in between each line in our file.
By default, Python's print
function prints a newline character (\n
) after whatever else that it prints (see the print
function's end
argument).
But each of our lines also end in a newline character, because newline characters are what separate lines in a file:
>>> line
"I doubt I'll ever need to make a metaclass, at least not for production code.\n"
So we either need to suppress the newline character that the print
function prints out or we need to remove the newline characters from each line in our file as we print them out:
>>> filename = "diary980.md"
>>> with open(filename) as diary_file:
... n = 1
... for line in diary_file:
... print(n, line.rstrip("\n"))
... n += 1
...
1 Python Log -- Day 980
2
3 Today I learned about metaclasses.
4 Metaclasses are a class's class.
5 Meaning every class is an instance of a metaclass.
6 The default metaclass is "type".
7
8 Classes control features (like string representations) of all their instances.
9 Metaclasses can control similar features for their classes.
10
11 I doubt I'll ever need to make a metaclass, at least not for production code.
We're using the string lstrip
method here to "strip" newline characters from the left-hand side (the beginning) of each of our line
strings just before print each line.
File objects in Python are lazy iterables, which means we can treat them pretty much the same way as any other iterable.
So instead of manually counting upward, we could pass our file object to the built-in enumerate
function.
The enumerate
function could then do the counting for us as we loop:
>>> filename = "diary980.md"
>>> with open(filename) as diary_file:
... for n, line in enumerate(diary_file, start=1):
... print(n, line.rstrip('\n'))
We've remove two lines of code but we get the same output as before:
1 Python Log -- Day 980
2
3 Today I learned about metaclasses.
4 Metaclasses are a class's class.
5 Meaning every class is an instance of a metaclass.
6 The default metaclass is "type".
7
8 Classes control features (like string representations) of all their instances.
9 Metaclasses can control similar features for their classes.
10
11 I doubt I'll ever need to make a metaclass, at least not for production code.
Files are lazy iterables, and as we loop over a file object, we'll get lines from that file.
When Python reads a file line-by-line, it doesn't store the whole file in memory all at once. Instead, it stores a small buffer of upcoming lines in that file, so it's more memory-efficient.
That means looping over files line-by-line is especially important if you're working with really big files.
Intro to Python courses often skip over some fundamental Python concepts.
Sign up below and I'll explain concepts that new Python programmers often overlook.
Intro to Python courses often skip over some fundamental Python concepts.
Sign up below and I'll share ideas new Pythonistas often overlook.