Python String Methods to Know

Trey Hunner smiling in a t-shirt against a yellow wall
Trey Hunner
14 min. read Python 3.7—3.11
Share
Copied to clipboard.
Tags

Python's strings have 47 methods. That's almost as many string methods as there are built-in functions in Python! Which string methods should you learn first?

There are about a dozen string methods that are extremely useful and worth committing to memory. Let's take a look at the most useful string methods and then briefly discuss the remaining methods and why they're less useful.

The most useful string methods

Here are the dozen-ish Python string methods I recommend committing to memory.

Method Related Methods Description
join Join iterable of strings by a separator
split rsplit Split (on whitespace by default) into list of strings
replace Replace all copies of one substring with another
strip rstrip & lstrip Remove whitespace from the beginning and end
casefold lower & upper Return a case-normalized version of the string
startswith Check if string starts with 1 or more other strings
endswith Check if string ends with 1 or more other strings
splitlines Split into a list of lines
format Format the string (consider an f-string before this)
count Count how many times a given substring occurs
removeprefix Remove the given prefix
removesuffix Remove the given suffix

You might be wondering "wait why is my favorite method not in that list?" I'll briefly explain the rest of the methods and my thoughts on them below. But first, let's look at each of the above methods.

join

If you need to convert a list to a string in Python, the string join method is what you're looking for.

>>> colors = ["purple", "blue", "green", "orange"]
>>> joined_colors = ", ".join(colors)
>>> joined_colors
'purple, blue, green, orange'

The join method can concatenate a list of strings into a single string, but it will accept any other iterable of strings as well.

>>> digits = range(10)
>>> digit_string = "".join(str(n) for n in digits)
>>> digit_string
'0123456789'

split

If you need to break a string into smaller strings based on a separator, you need the string split method.

>>> time = "1:19:48"
>>> parts = time.split(":")
>>> parts
['1', '19', '48']

Your separator can be any substring. We're splitting by a : above, but we could also split by ->:

>>> graph = "A->B->C->D"
>>> graph.split("->")
('A', 'B', 'C', 'D')

You usually wouldn't want to call split with a space character:

>>> langston = "Does it dry up\nlike a raisin in the sun?\n"
>>> langston.split(" ")
['Does', 'it', 'dry', 'up\nlike', 'a', 'raisin', 'in', 'the', 'sun?\n']

Splitting on the space character works, but often when splitting on spaces it's actually more useful to split on all whitespace.

Calling split method no arguments will split on any consecutive whitespace characters:

>>> langston = "Does it dry up\nlike a raisin in the sun?\n"
>>> langston.split()
['Does', 'it', 'dry', 'up', 'like', 'a', 'raisin', 'in', 'the', 'sun?']

Note that split without any arguments also removes leading and trailing whitespace.

There's one more split feature that folks sometimes overlook: the maxsplit argument. When calling split with a maxsplit value, Python will split the string up that number of times. This is handy when you only care about the first one or two occurrences of a separator in a string:

>>> line = "Rubber duck|5|10"
>>> item_name, the_rest = line.split("|", maxsplit=1)
>>> item_name
'Rubber duck'

If it's the last couple occurrences of a separator that you care about, you'll want to use the string rsplit method instead:

>>> the_rest, amount = line.rsplit("|", maxsplit=1)
>>> amount
'10'

With the exception of calling split method without any arguments, there's no way to ignore repeated separators or trailing/leading separators or to supports multiple separators at once. If you need any of those features, you'll want to look into regular expressions (specifically the re.split function).

replace

Need to replace one substring (a string within a string) with another? That's what the string replace method is for!

>>> message = "JavaScript is lovely"
>>> message.replace("JavaScript", "Python")
'Python is lovely'

The replace method can also be used for removing substrings, by replacing them with an empty string:

>>> message = "Python is lovely!!!!"
>>> message.replace("!", "")
'Python is lovely'

There's also an optional count argument, in case you only want to replace the first N occurrences:

>>> message = "Python is lovely!!!!"
>>> message.replace("!", "?", 2)
'Python is lovely??!!'

strip

The strip method is for removing whitespace from the beginning and end of a string:

>>> text = """
... Hello!
... This is a multi-line string.
... """
>>> text
'\nHello!\nThis is a multi-line string.\n'
>>> stripped_text = text.strip()
>>> stripped_text
'Hello!\nThis is a multi-line string.'

If you just need to remove whitespace from the end of the string (but not the beginning), you can use the rstrip method:

>>> line = "    Indented line with trailing spaces  \n"
>>> line.rstrip()
'    Indented line with trailing spaces'

And if you need to strip whitespace from just the beginning, you can use the lstrip method:

>>> line = "    Indented line with trailing spaces  \n"
>>> line.lstrip()
'Indented line with trailing spaces  \n'

Note that by default strip, lstrip, and rstrip remove all whitespace characters (space, tab, newline, etc.). You can also specify a specific character to remove instead. Here we're removing any trailing newline characters but leaving other whitespace intact:

>>> line = "Line 1\n"
>>> line
'Line 1\n'
>>> line.rstrip("\n")
'Line 1'

Note that strip, lstrip, and rstrip will also accept a string of multiple characters to strip.

>>> words = ['I', 'enjoy', 'Python!', 'Do', 'you?', 'I', 'hope', 'so.']
>>> [w.strip(".!?") for w in words]
['I', 'enjoy', 'Python', 'Do', 'you', 'I', 'hope', 'so']

Passing multiple characters will strip all of those characters, but they'll be treated as individual characters (not as a substring).

If you need to strip a multi-character substring instead of individual characters, see removesuffix and removeprefix below.

casefold

Need to uppercase a string? There's an upper method for that:

>>> name = "Trey"
>>> name.upper()
'TREY'

Need to lowercase a string? There's a lower method for that:

>>> name = "Trey"
>>> name.lower()
'trey'

What if you're trying to do a case-insensitive comparison between strings? You could lowercase or uppercase all of your strings for the comparison. Or you could use the string casefold method:

>>> name = "Trey"
>>> "t" in name
False
>>> "t" in name.casefold()
True

But wait, isn't casefold just the same thing as lower?

>>> name = "Trey"
>>> name.casefold()
'trey'

Almost. If you're working with ASCII characters, casefold does exactly the same thing as the string lower method.

But if you have non-ASCII characters (see Unicode character encodings in Python), there are some characters that casefold handles uniquely.

There are a few hundred characters that normalize differently between the lower and casefold methods. If you're working with text using the International Phonetic alphabet or text written in Greek, Cyrillic, Armenian, Cherokee, and large handful of other languages you should probably use casefold instead of lower.

Do keep in mind that casefold doesn't solve all text normalization issues though. It's possible to represent the same data in multiple ways in Python, so you'll need to look into Unicode data normalization and Python's unicodedata module if you think you'll be comparing non-ASCII text often.

startswith

The string startswith method can check whether one string is a prefix of another string:

>>> property_id = "UA-1234567"
>>> property_id.startswith("UA-")
True

The alternative to startswith is to slice the bigger string and do an equality check:

>>> property_id = "UA-1234567"
>>> prefix = "UA-"
>>> property_id[:len(prefix)] == prefix
True

That works, but it's awkward.

You can also quickly check whether one strings starts with many different substrings by passing a tuple of substrings to startswith.

Here we're checking whether each string in a list starts with a vowel to determine whether the article "an" or "a" should be used:

>>> names = ["Go", "Elixir", "OCaml", "Rust"]
>>> for name in names:
...     if name.startswith(("A", "E", "I", "O", "U")):
...         print(f"An {name} program")
...     else:
...         print(f"A {name} program")
...
A Go program
An Elixir program
An OCaml program
A Rust program

Note that startswith returns True if any if the string starts with any of the given substrings.

Many long-time Python programmers often overlook the fact that startswith will accept either a single string or a tuple of strings.

endswith

The endswith method can check whether one string is a suffix of another string.

The string endswith method works pretty much like the startswith method.

It works with a single string:

>>> filename = "3c9a9fd05f404aefa92817650be58036.min.js"
>>> filename.endswith(".min.js")
True

But it also accepts a tuple of strings:

>>> filename = "3c9a9fd05f404aefa92817650be58036.min.js"
>>> filename.endswith((".min.js", ".min.css"))
True

Just as with startswith, when endswith is given a tuple, it returns True if our string ends with any of the strings in that tuple.

splitlines

The splitlines method is specifically for splitting up strings into lines.

>>> text = "I'm Nobody! Who are you?\nAre you – Nobody – too?"
>>> text.splitlines()
["I'm Nobody! Who are you?", 'Are you – Nobody – too?']

Why make a separate method just for splitting into lines? Couldn't we just use the split method with \n instead?

>>> text.split("\n")
["I'm Nobody! Who are you?", 'Are you – Nobody – too?']

While that does work in some cases, sometimes newlines are represented by \r\n or simply \r instead of \n. If you don't know exactly what line endings your text uses, splitlines can be handy.

>>> text = "Maybe it just sags\r\nlike a heavy load.\r\nOr does it explode?"
>>> text.split("\n")
['Maybe it just sags\r', 'like a heavy load.\r', 'Or does it explode?']
>>> text.splitlines()
['Maybe it just sags', 'like a heavy load.', 'Or does it explode?']

But there's an even more useful reason to use splitlines: it's quite common for text to end in a trailing newline character.

>>> zen = "Flat is better than nested.\nSparse is better than dense.\n"

The splitlines method will remove a trailing newline if it finds one, whereas the split method will split on that trailing newline which would give us an empty line at the end (likely not what we actually want when splitting on lines).

>>> zen.split("\n")
['Flat is better than nested.', 'Sparse is better than dense.', '']
>>> zen.splitlines()
['Flat is better than nested.', 'Sparse is better than dense.']

Unlike split, the splitlines method can also split lines while maintaning the existing line endings by specifying keepends=True:

>>> zen.splitlines(keepends=True)
['Flat is better than nested.\n', 'Sparse is better than dense.\n']

When splitting strings into lines in Python, I recommend reaching for splitlines instead of split.

format

Python's format method is used for string formatting (a.k.a. string interpolation).

>>> version_message = "Version {version} or higher required."
>>> print(version_message.format(version="3.10"))
Version 3.10 or higher required

Python's f-strings were an evolution of the format method.

>>> name = "Trey"
>>> print(f"Hello {name}! Welcome to Python.")
Hello Trey! Welcome to Python.

You might think that the format method doesn't have much use now that f-strings have long been part of Python. But the format method is handy for cases where you'd like to define your template string in one part of your code and use that template string in another part.

For example we might define a string-to-be-formatted at the top of a module and then use that string later on in our module:

BASE_URL = "https://api.stackexchange.com/2.3/questions/{ids}?site={site}"

# More code here

question_ids = ["33809864", "2759323", "9321955"]
url_for_questions = BASE_URL.format(
    site="stackoverflow",
    ids=";".join(question_ids),
)

We've predefined our BASE_URL template string and then later used it to construct a valid URL with the format method.

count

The string count method accepts a substring and returns the number of times that substring occurs within our string:

>>> time = "3:32"
>>> time.count(":")
1
>>> time = "2:17:48"
>>> time.count(":")
2

That's it. The count method is pretty simple.

Note that if you don't care about the actual number but instead care whether the count is greater than 0:

has_underscores = text.count("_") > 0

You don't need the count method.

Why? Because Python's in operator is a better way to check whether a string contains a substring:

has_underscores = "_" in text

This has the added benefit that the in operator will stop as soon as it finds a match, whereas count always needs to iterate through the entire string.

removeprefix

The removeprefix method will remove an optional prefix from the beginning of a string.

>>> hex_string = "0xfe34"
>>> hex_string.removeprefix("0x")
'fe34'
>>> hex_string = "ac6b"
>>> hex_string.removeprefix("0x")
'ac6b'

The removeprefix method was added in Python 3.9. Before removeprefix, it was common to check whether a string startswith a prefix and then remove it via slicing:

if hex_string.startswith("0x"):
    hex_string = hex_string[len("0x"):]

Now you can just use removeprefix instead:

hex_string = hex_string.removeprefix("0x")

The removeprefix method is a bit similar to the lstrip method except that lstrip removes single characters from the end of a string and it removes as many as it finds.

So while this will remove all leading v characters from the beginning of a string:

>>> a = "v3.11.0"
>>> a.lstrip("v")
"3.11.0"
>>> b = "3.11.0"
>>> b.lstrip("v")
"3.11.0"
>>> c = "vvv3.11.0"
>>> c.lstrip("v")
"3.11.0"

This would remove at most one v from the beginning of the string:

>>> a = "v3.11.0"
>>> a.removeprefix("v")
"3.11.0"
>>> b = "3.11.0"
>>> b.lstrip("v")
"3.11.0"
>>> c = "vvv3.11.0"
>>> c.removeprefix("v")
"vv3.11.0"

removesuffix

The removesuffix method will remove an optional suffix from the end of a string.

>>> time_readings = ["0", "5 sec", "7 sec", "1", "8 sec"]
>>> new_readings = [t.removesuffix(" sec") for t in time_readings]
>>> new_readings
['0', '5', '7', '1', '8']

It does pretty much the same thing as removeprefix, except it removes from the end instead of removing from the beginning.

Learn these methods later

I wouldn't memorize these string methods today, but you might consider eventually looking into them.

Method Related Methods Description
encode Encode string to bytes object
find rfind Return index of substring or -1 if not found
index rindex Return index of substring or raise ValueError
title capitalize Title-case the string
partition rpartition Partition into 3 parts based on a separator
ljust rjust & center Left/right/center-justify the string
zfill Pad numeric string with zeroes (up to a width)
isidentifier Check if string is a valid Python identifier

Here's why I don't recommend committing each of these to memory:

  • encode: you can usually avoid manually encoding strings but you'll discover this method by necessity when you can't (see converting between binary data and strings in Python)
  • find and rfind: we rarely care about finding substring indexes: usually it's containment we want (for example we use 'y' in name instead of name.find('y') != -1)
  • index and rindex: these raise an exception if the given index isn't found, so it's rare to see these methods used
  • title and capitalize: the title method doesn't always work as you'd expect (see Title-casing a string in Python) and capitalize only capitalizes the first character
  • partition and rpartition: these can be very handy when splitting while checking whether you split, but I find myself using split and split more often
  • ljust, rjust, and center: these methods left/right/center-justify text and I usually prefer the <, >, and ^ string formatting modifiers instead (see formatting strings)
  • zfill: this method zero-pads strings to make them a specific width and I usually prefer using string formatting for zero-filling as well (see zero-padding while string formatting)
  • isidentifier: this is niche but useful for checking that a string is a valid Python identifier, though this usually needs pairing with keyword.iskeyword to exclude Python keywords

Alternatives to regular expressions

These methods are used for asking questions about your strings. Most of these ask a question about every character in the string, with the exception of the istitle method.

Method Related Methods Description
isdecimal isdigit & isnumeric Check if string represents a number
isascii Check whether all characters are ASCII
isprintable Check whether all characters are printable
isspace Check whether string is entirely whitespace
isalpha islower & isupper Check if string contains only letters
isalnum Check if string contains letters or digits
istitle Check if string is title-cased

These methods might be useful in very specific circumstances. But when you're asking these sorts of questions, using a regular expression might be more appropriate.

Also keep in mind that these methods might not always act how you might expect. All of isdigit, isdecimal, and isnumeric match more than just 0 to 9 and none of them match - or .. The isdigit method matches everything isdecimal matches plus more and the isnumeric method matches everything that isdecimal matches plus more. So while only isnumeric matches , isdigit and isnumeric match , and all of them match ۸.

You likely don't need these methods

These 5 methods are pretty rare to see:

  • expandtabs: convert tab characters into spaces (the number of spaces needed to hit the next 8 character tab stop)
  • swapcase: convert uppercase to lowercase and lowercase to uppercase
  • format_map: calling my_string.format_map(mapping) is the same as my_string.format(**mapping)
  • maketrans: create a dictionary mapping character code points between keys and values (to be passed to str.translate)
  • translate: map all of one code point to another one in a given string

Learn what you need

Python's strings have a ton of methods. It's really not worth memorizing them all: save your time for something more fruitful.

While memorizing everything is a waste of time, it is worth committing more useful string methods to memory. If a method would be useful pretty much every week, commit it to memory.

I recommend memorizing Python's most useful string methods, roughly in this order:

  1. join: Join iterable of strings by a separator
  2. split: Split (on whitespace by default) into list of strings
  3. replace: Replace all copies of one substring with another
  4. strip: Remove whitespace from the beginning and end
  5. casefold (or lower if you prefer): Return a case-normalized version of the string
  6. startswith & endswith: Check if string starts/ends with 1 or more other strings
  7. splitlines: Split into a list of lines
  8. format: Format the string (consider an f-string before this)
  9. count: Count how many times a given substring occurs
  10. removeprefix & removesuffix: Remove the given prefix/suffix

Want help learning Python's most useful string methods?

Want to commit all these string methods to long-term memory? I'm working on a system that could help you do that in 5 minutes per day over about 10 days.

This system could also help you commit many other important Python concepts to memory as well.

Want to get early access?


I will not spam you and you can unsubscribe at any time.

Concepts Beyond Intro to Python

Intro to Python courses often skip over some fundamental Python concepts.

Sign up below and I'll share ideas new Pythonistas often overlook.