Substrings in Python: checking if a string contains another string

Trey Hunner

6 min. read • Python 3.8—3.12 • May 9, 2023

What we're looking for

Let's say we have two strings, part and whole.

>>> part = "Python"
>>> whole = "I enjoy writing Python code."

We'd like to know whether part is a substring of whole. That is, we'd like to know whether the entire part string appears within the whole string.

Python's strings have an index method and a find method that can look for substrings. Folks new to Python often reach for these methods when checking for substrings, but there's an even simpler way to check for a substring.

Let's take a look at the index and find methods first and briefly discuss why I don't recommend using them and why I recommend using the in operator instead.

The string `index` method

The string index method accepts a string and returns the index that the given string was first found:

>>> whole.index(part)
16

If the given string was not found, a ValueError exception will be raised:

>>> whole.index("Java")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: substring not found

To determine whether whole contains part, we could use exception handling along with the string index method:

>>> try:
...     whole.index(part)
...     found = True
... except ValueError:
...     found = False
...
>>> found
True

But exception handling can incur a performance cost, so we may want to avoid the index method for performance reasons alone.

Performance considerations aside, this approach also seems a bit too verbose. We're using 4 lines of code just to check whether one string contains another.

I've always found the behavior of the index method a bit surprising. I recommend avoiding the string index method.

The string `find` method

The string find method accepts a string and returns the index that the given string was first found:

>>> whole.find("Python")
16

If no string was found, find returns -1 instead:

>>> whole.find("Java")
-1

To determine whether whole contains part, we could use the string find method and then make sure the returned index is greater than -1:

>>> found = whole.find(part) > -1
>>> found
True

That works, but it feels a bit awkward. In particular, that -1 seems a bit too magical.

The string `count` method

What if we count the number of times a given substring is found?

Python's strings have a count method that will accept a substring and return a count of the number of times that substring was found in the string.

>>> count = whole.count("Python")
>>> count
1

This approach works, but it does a bit more than we need.

The index and find methods will return True as soon as those methods find a substring. But with the count method, our code will always loop all the way through the string in order to count every occurrence. This performance difference likely isn't a concern with smaller strings, but it is something to keep in mind.

The count method seems like the most readable approach so far, even if it may sometimes be less performant than using the find method.

But there's an even better way to check whether one string is the substring of another string.

The `in` operator

Python has an in operator that works with many data structures, including strings. Here's in used to check whether a list contains a particular item:

>>> numbers = [2, 1, 3, 4, 7, 11, 18]
>>> 5 in numbers
False

With strings, the in operator is used for checking whether one string contains another:

>>> message = "I enjoy writing Python code."
>>> "Python" in message
True

This is exactly the tool we've been seeking!

Python's in operator is typically the most idiomatic way to check whether one string contains another string in Python.

Case-insensitive containment checking

What if you want to ignore capitalization while performing your string containment check?

Take these two strings:

>>> part = "python"
>>> whole = "I enjoy writing Python code."

The part string isn't within the whole string, but it would be if we could somehow disregard whether each letter is uppercase or lowercase.

>>> part in whole
False

To solve this problem, we could combine two tools:

The in operator
The string casefold method (or the upper or lower methods if you prefer)

The casefold method will lowercase a string while specially considering certain Unicode characters as well:

>>> part.casefold()
'python'
>>> whole.casefold()
'i enjoy writing python code.'

If we call casefold on our part and whole strings and then use the in operator, we'll essentially perform a case-insensitive containment check:

>>> part.casefold() in whole.casefold()
True

Many Python problems can't be solved with a single tool, but can be solved fairly simply by combining a couple simple tools. Combining the casefold method and the in operator to perform case-insensitive substring checks is just one such helpful mash-up.

Checking for string prefixes and suffixes

What if you need to check whether one string contains another specifically at the beginning or the end of the string?

You can use the string startswith method to check whether one string starts with another string:

>>> whole = "I enjoy writing Python code."
>>> whole.startswith("You ")
False
>>> whole.startswith("I ")
True

You can use the string endswith method to check whether one string ends with another string:

>>> whole = "I enjoy writing Python code."
>>> whole.endswith("code?")
False
>>> whole.endswith("code.")
True

You can think of startswith as a prefix check and endswith as a suffix check.

Checking for a "word"

Note that up to this point we've been checking for substrings... but what if you actually need to check for a word within a string?

What's the difference? Well, say we look for the word cat in this string:

>>> whole = "We could concatenate our strings."
>>> "cat" in whole
True

While our string does contain the substring cat, it doesn't contain the word cat. To look for a word, we could split our string up and remove known punctuation characters:

>>> words = [word.strip(".,?!") for word in whole.split()]
>>> "cat" in words
False

This approach could be a bit finicky though, especially if we don't know exactly what punctuation marks might appear in our string. Using a regular expression would be a better approach.

Checking for a pattern

What if you need something slightly more complex? What if you want to check whether a string contains a pattern that can't be described by a simple substring check?

For example, what if we wanted to check whether a string contained PyCon, some whitespace, and then a 4-digit number (e.g. PyCon 2025 or PyCon 3030)?

>>> message = "My first Python conference was PyCon 2014."

At this point, you'll probably want to reach for a regular expression. Here we're using a regular expression to look for PyCon, one or more whitespace characters, 4 digit characters, and a then "word boundary":

>>> import re
>>> contains_pycon_year = bool(re.search(r"PyCon\s+\d{4}\b", message))
>>> contains_pycon_year
True

A regular expression could also help us look for a "word" in our string (as discussed in the last section):

>>> contains_cat = bool(re.search(r"\bcat\b", message))
>>> contains_cat
False

Regular expressions are powerful, but they're often confusing for new programmers as well as many experienced Python programmers. I recommend avoiding regular expressions whenever a simple containment check would suffice.

Use Python's `in` operator for substring checks

Need to check whether one string contains another in Python? If all you need is a simple substring check, use the in operator.

>>> part = "Python"
>>> whole = "I enjoy writing Python code."
>>> part in whole
True

If you need a case-insensitive containment check, use the string casefold method:

>>> part = "Python"
>>> whole = "I enjoy writing Python code."
>>> part.casefold() in whole.casefold()
True

If you do need something more complex, you may need to reach for a regular expression. But please stick to the in operator if you can. Reaching for a regular expression prematurely can make your code harder to read with little benefit.

What comes after Intro to Python?

Intro to Python courses often skip over some fundamental Python concepts.

↑

Concepts Beyond Intro to Python