Let's talk about the difference between strings and bytes in Python.
bytesobjects in Python
Strings represent text (human language that is).
For example, here we have a string named
>>> text = "hello"
But there's another type that's closely associated with strings, which kind of looks like making a string with a
b prefixed in front of it.
>>> data = b"hello"
>>> data b'hello' >>> type(data) <class 'bytes'>
bytesobjects represent binary data
If we loop over a string in Python, we'll get back sub-strings representing each of the characters in that string:
>>> text = "hello" >>> list(text) ['h', 'e', 'l', 'l', 'o']
What do you think we'll get if we loop over a
>>> data = b"hello" >>> list(data)
bytes objects represent binary data, when we loop over them we get back numbers (from
255) representing each of the bytes in that binary data:
>>> data = b"hello" >>> list(data) [104, 101, 108, 108, 111]
We can also do the opposite of this.
We can take an iterable of numbers and turn it into a
bytes object by passing it to the
>>> nums = [0, 65, 97, 255] >>> bytes(nums) b'\x00Aa\xff'
All data that comes from outside of our Python process starts as bytes. But if that data represents text (and Python knows it) Python will convert it to strings automatically.
If we use the urllib module in Python to do an HTTP request, the data that we get back is not represented as a string:
>>> from urllib.request import urlopen >>> data = urlopen('https://pseudorandom.name').read() >>> data b'Grace Jones\n' >>> type(data) <class 'bytes'>
The data we get back is represented as a
bytes object because it might not even represent text.
After all, an HTTP request can send back any data, even arbitrary binary data.
If we open up a file with the mode of
rb, we're opening that file not in the default read-text mode, but instead in read-binary mode.
>>> with open("avatar.jpg", mode="rb") as jpg_file: ... jpg_data = jpg_file.read() ...
So when we read from that file, the data that we get out of it will not be a string, it'll be a
>>> type(jpg_data) <class 'bytes'>
In fact in this case where we're opening up a
jpg file, we get a
bytes object with a lot of bytes in it, because it takes a lot of bytes to represent an image:
>>> len(jpg_data) 1108051
bytesinto a string
If you end up with a
bytes object in Python, and you know that that object represents text, you can turn it into a string by calling its
>>> data = b"bytes! \xe2\x9c\xa8" >>> data.decode() 'bytes! ✨'
decode method (without any arguments passed to it) uses a default character encoding of
Even if we know that the data we're working with uses that default character encoding of
utf-8, it's considered a best practice to always specify the encoding of our bytes:
>>> text = data.decode("utf-8") >>> text 'bytes! ✨'
As the Zen of Python says, "explicit is always better than implicit".
If for some reason you have a string you want to turn it into bytes, you can call the
encode method on that string to encode it into bytes:
>>> text.encode() b'bytes! \xe2\x9c\xa8'
encode method defaults to using
utf-8, but you could specify a different character encoding if you wanted to:
>>> text.encode("utf-8") b'bytes! \xe2\x9c\xa8' >>> text.encode("utf-16-le") b"b\x00y\x00t\x00e\x00s\x00!\x00 \x00('"
Strings represent text-based data, while bytes represent binary data (i.e. images, video, or anything else you could represent on a computer).
Depending on what you use Python for, you probably won't encounter
bytes objects very often.
But when you do, the one thing you'll probably want to do with them is call their
decode method to turn them into a string (assuming those bytes represent text).
Need to fill-in gaps in your Python skills?
Sign up for my Python newsletter where I share one of my favorite Python tips every week.
Need to fill-in gaps in your Python skills? I send weekly emails designed to do just that.