Representing binary data with bytes

Trey Hunner smiling in a t-shirt against a yellow wall
Trey Hunner
3 min. read Watch as video Python 3.7—3.11
Share
Copied to clipboard.
Python Morsels
Watch as video
03:25

Let's talk about the difference between strings and bytes in Python.

Creating bytes objects in Python

Strings represent text (human language that is). For example, here we have a string named text:

>>> text = "hello"

But there's another type that's closely associated with strings, which kind of looks like making a string with a b prefixed in front of it.

>>> data = b"hello"

That b is sort of like an f before an f-string, or an r before a raw string. But that b doesn't actually make a string, it makes a bytes object:

>>> data
b'hello'
>>> type(data)
<class 'bytes'>

Strings represent text, bytes objects represent binary data

If we loop over a string in Python, we'll get back sub-strings representing each of the characters in that string:

>>> text = "hello"
>>> list(text)
['h', 'e', 'l', 'l', 'o']

What do you think we'll get if we loop over a bytes object?

>>> data = b"hello"
>>> list(data)

Since bytes objects represent binary data, when we loop over them we get back numbers (from 0 to 255) representing each of the bytes in that binary data:

>>> data = b"hello"
>>> list(data)
[104, 101, 108, 108, 111]

We can also do the opposite of this. We can take an iterable of numbers and turn it into a bytes object by passing it to the bytes constructor:

>>> nums = [0, 65, 97, 255]
>>> bytes(nums)
b'\x00Aa\xff'

Where are bytes objects used in Python?

All data that comes from outside of our Python process starts as bytes. But if that data represents text (and Python knows it) Python will convert it to strings automatically.

If we use the urllib module in Python to do an HTTP request, the data that we get back is not represented as a string:

>>> from urllib.request import urlopen
>>> data = urlopen('https://pseudorandom.name').read()
>>> data
b'Grace Jones\n'
>>> type(data)
<class 'bytes'>

The data we get back is represented as a bytes object because it might not even represent text. After all, an HTTP request can send back any data, even arbitrary binary data.

If we open up a file with the mode of rb, we're opening that file not in the default read-text mode, but instead in read-binary mode.

>>> with open("avatar.jpg", mode="rb") as jpg_file:
...     jpg_data = jpg_file.read()
...

So when we read from that file, the data that we get out of it will not be a string, it'll be a bytes object.

>>> type(jpg_data)
<class 'bytes'>

In fact in this case where we're opening up a jpg file, we get a bytes object with a lot of bytes in it, because it takes a lot of bytes to represent an image:

>>> len(jpg_data)
1108051

How to convert bytes into a string

If you end up with a bytes object in Python, and you know that that object represents text, you can turn it into a string by calling its decode method:

>>> data = b"bytes! \xe2\x9c\xa8"
>>> data.decode()
'bytes! ✨'

The decode method (without any arguments passed to it) uses a default character encoding of utf-8. Even if we know that the data we're working with uses that default character encoding of utf-8, it's considered a best practice to always specify the encoding of our bytes:

>>> text = data.decode("utf-8")
>>> text
'bytes! ✨'

As the Zen of Python says, "explicit is always better than implicit".

If for some reason you have a string you want to turn it into bytes, you can call the encode method on that string to encode it into bytes:

>>> text.encode()
b'bytes! \xe2\x9c\xa8'

Just like decode, the encode method defaults to using utf-8, but you could specify a different character encoding if you wanted to:

>>> text.encode("utf-8")
b'bytes! \xe2\x9c\xa8'
>>> text.encode("utf-16-le")
b"b\x00y\x00t\x00e\x00s\x00!\x00 \x00('"

Summary

Strings represent text-based data, while bytes represent binary data (i.e. images, video, or anything else you could represent on a computer).

Depending on what you use Python for, you probably won't encounter bytes objects very often. But when you do, the one thing you'll probably want to do with them is call their decode method to turn them into a string (assuming those bytes represent text).

A Python Tip Every Week

Need to fill-in gaps in your Python skills? I send weekly emails designed to do just that.

Python Morsels
Watch as video
03:25