Appreciating Python's match-case by parsing Python code

Trey Hunner smiling in a t-shirt against a yellow wall
Trey Hunner
9 min. read Python 3.10—3.11
Share
Copied to clipboard.

I stayed up past my bedtime recently and made a script and later a web app to convert a dataclass to a non-dataclass. The web app is powered by a WebAssembly build of Python (which also powers my Python pastebin tool).

While making this script I found excuses to use odd Python features, the most interesting being Python's match-case statement.

Python 3.10 added a match-case block that folks often assume to be equivalent to the switch-case blocks from other programming languages. While you can use match-case like switch-case, you usually wouldn't: match-case is both more powerful and more complex than switch-case. Python's match-case blocks are for structural pattern matching -- that phrase sounds complex because it is!

I'll write a follow-up post soon on how this script works at a high level, but right now I'd like to talk about my adventures using structural pattern matching to writing this code.

Update: the follow-up post is now available: How I made a dataclass remover.

Why remove dataclasses?

First let's briefly talk about why I made this tool.

Why would anyone want to convert a dataclass into "not a dataclass"?

There are trade offs with using dataclasses: performance concerns (which don't usually matter) and edge cases where things get weird (__slots__ and slots=True are both finicky). But my reason for creating this dataclass to regular class converter was to help me better teach dataclasses. Seeing the equivalent code for a dataclass helps us appreciate what dataclasses do for us.

Okay let's dive into match-case.

Oh, that's what that tool is for?

I knew the adventure I was embarking on involved parsing Python code. I don't usually parse Python code: I leave that up to tools like Black, flake8, and the Python interpreter itself.

But I did know that Python's ast module had a parse function which could accept a string representing Python code and return an "abstract syntax tree" (often shortened to AST) that represented that Python code.

Using ast.parse to get a tree of AST nodes was easy. The hard part came in making sense of those deeply-nested AST nodes.

I found myself writing a lot of if-elif blocks with very complex conditions. Take this code for example:

if isinstance(node, ast.Call):
    if (isinstance(node.func, ast.Attribute)
            and node.func.value.id == "dataclasses"
            and node.func.attr == "dataclass"):
        return True
    elif node.func.id == "dataclass":
        return True
elif (isinstance(node, ast.Attribute)
        and node.value.id == "dataclasses"
        and node.value.attr == "dataclass"):
    return True
elif isinstance(node, ast.Name) and node.id == "dataclass":
    return True
else:
    return False

That code checks for 4 different uses of the dataclass decorator:

  1. dataclasses.dataclass(...)
  2. dataclass(...)
  3. dataclasses.dataclass
  4. dataclass

After writing the above code I remembered playing with match-case shortly after Python 3.10 was released. Seeing those isinstance checks in particular made me think "wait a minute, match-case was made for this!"

After introspecting (via breakpoint and Python's debugging friends), I found that I could refactor the above if-elif into this equivalent match-case block:

match node:
    case ast.Call(
        func=ast.Attribute(
            value=ast.Name(id="dataclasses"),
            attr="dataclass",
        ),
    ):
        return True
    case ast.Call(func=ast.Name(id="dataclass")):
        return True
    case ast.Attribute(
        value=ast.Name(id="dataclasses"),
        attr="dataclass"
    ):
        return True
    case ast.Name(id="dataclass"):
        return True
    case _:
        return False

With each of the case statements I wrote above, assertions were made about:

  1. The type of object being matching
  2. The types of specific attribute values
  3. The types and values of subattributes: we matched attributes deeply, such as node.func.value.id

That first case statement nicely demonstrates the power of match-case for matching deeply-nested data structures. We're using a single expression to confirm that node is a Call statement and the expression it's calling is an attribute lookup of dataclasses.dataclass:

    case ast.Call(
        func=ast.Attribute(
            value=ast.Name(id="dataclasses"),
            attr="dataclass",
        ),
    )

Compare that to these nested if statements, which do the same thing:

if isinstance(node, ast.Call):
    if (isinstance(node.func, ast.Attribute)
            and node.func.value.id == "dataclasses"
            and node.func.attr == "dataclass")

Both of those blocks of code say "I have a Call object which contains an Attribute object which has a specific attr and also contains a Name with a certain id". But the match-case statement does that so much more succinctly and I found it much more readable than the equivalent if-elif.

Using "or patterns" to match multiple sub-patterns

During this match-case refactoring I realized I needed an easy way to say "this attribute could be either A or B".

I dug through the structural pattern matching tutorial PEP and (fortunately) found just what I needed: the | operator. The | operator allows a single case statement to match against multiple patterns at once.

Instead of this giant if statement (note that giant elif clause):

if subnode.value == None:
    field = dataclasses.field()
elif (isinstance(subnode.value, ast.Call) and (
        isinstance(subnode.value.func, ast.Name)
        and subnode.value.func.id == "field"
        or
        isinstance(subnode.value.func, ast.Attribute)
        and isinstance(subnode.value.func.value, ast.Name)
        and subnode.value.func.value.id == "dataclasses"
        and subnode.value.func.value.attr == "field")):
    field = dataclasses.field(**{
        kwarg.arg: parse_field_argument(kwarg.arg, kwarg.value)
        for kwarg in subnode.value.keywords
    })
else:
    field = dataclasses.field(default=ast.unparse(subnode.value))

I wrote this match statement:

match subnode:
    case ast.AnnAssign(value=None):
        field = dataclasses.field()
    case ast.AnnAssign(
        value=ast.Call(
            func=
                ast.Name(id="field")
                |
                ast.Attribute(value=ast.Name(id="dataclasses"), attr="field")
        )
    ):
        field = dataclasses.field(**{
            kwarg.arg: parse_field_argument(kwarg.arg, kwarg.value)
            for kwarg in subnode.value.keywords
        })
    case ast.AnnAssign():
        field = dataclasses.field(default=ast.unparse(subnode.value))

That match statement is very complex, but it's much less visually dense than that if statement was. That second case statement ensures that the annotated assignment node we're matching has a value attribute which is either field(...) or dataclasses.field(...).

    case ast.AnnAssign(
        value=ast.Call(
            func=
                ast.Name(id="field")
                |
                ast.Attribute(value=ast.Name(id="dataclasses"), attr="field")
        )
    ):

Writing this 8 line long case statement with that "or pattern" felt very silly. But I found that I prefer it over the alternative elif logic:

elif (isinstance(subnode.value, ast.Call) and (
        isinstance(subnode.value.func, ast.Name)
        and subnode.value.func.id == "field"
        or
        isinstance(subnode.value.func, ast.Attribute)
        and isinstance(subnode.value.func.value, ast.Name)
        and subnode.value.func.value.id == "dataclasses"
        and subnode.value.func.value.attr == "field")):

The Zen of Python says "simple is better than complex" but it also says complex is better than complicated. Both the elif and case statements above are complex because making sense of abstract syntax trees is an inherently complex activity. But that case statement seems a bit less complicated than the elif equivalent.

Conditional patterns with guard clauses

The last match-case feature I discovered caught me by surprise.

In this if statement, the third condition can't be boiled down to a simple structural pattern in match-case land:

if isintance(node, ast.ImportFrom) and node.module == "dataclasses":
    continue  # Don't import dataclasses anymore
elif isinstance(node, ast.Import) and node.names[0].name == "dataclasses":
    continue  # Don't import dataclasses anymore
elif isinstance(node, ast.ClassDef) and any(
    is_dataclass_decorator(n)
    for n in node.decorator_list
):
    need_total_ordering |= update_dataclass_node(node)
    new_nodes.append(node)
else:
    new_nodes.append(node)

At first I thought I needed to give up on using match-case for that condition and resort to a nested if-else statement. But then I stumbled upon guard clauses.

Guard clauses are handy when you need a case clause that has some actual boolean logic in it. Using a guard clause, the above if-elif can be rewritten like this (note that third case statement with that if condition on the end):

match node:
    case ast.ImportFrom(module="dataclasses"):
        continue  # Don't import dataclasses anymore
    case ast.Import(names=[ast.alias("dataclasses")]):
        continue  # Don't import dataclasses anymore
    case ast.ClassDef() if any(
        is_dataclass_decorator(n)
        for n in node.decorator_list
    ):
        need_total_ordering |= update_dataclass_node(node)
        new_nodes.append(node)
    case _:
        new_nodes.append(node)

In that third case statement we're checking the type of the node (something match-case statements are great at) and we're also asking a complex question about the decorator_list attribute of that node (thanks to that if guard clause with that any(...) logic).

While I did find a guard clause helpful here, this feature does feel like an escape hatch that should only be used when there's not a more readable alternative.

Structural pattern matching visually describes the structure of objects

This undataclassing adventure was not my first time using match-case. But before I wrote undataclass.py most of my match-case statements involved matching iterables.

For example while prepping a talk on match-case for my local meetup, I noticed that this Django template tag parsing function:

def do_get_available_languages(parser, token):
    args = token.contents.split()
    if len(args) != 3 or args[1] != "as":
        raise TemplateSyntaxError(
            "'get_available_languages' requires 'as variable' (got %r)" % args
        )
    return GetAvailableLanguagesNode(args[2])

Could be rewritten like this:

def do_get_available_languages(parser, token):
    match token.split_contents():
        case [name, "for", code "as" info]:
            return GetLanguageInfoNode(parser.compile_filter(code), info)
        case [name, *rest]:
            raise TemplateSyntaxError(
                f"'{name}' requires 'for string as variable' (got {rest!r})"
            )

Even if you don't understand how structural pattern matching works, that second block of code is likely easier to make guesses about at a glance. Just like with tuple unpacking, that match-case statement visually demonstrates the shape of our code.

Python's match-case statement can even be used to match nested dictionary items. For example this nested dictionary-processing code:

if webhook_data["event_type"] == "order_created":
    customer_id = webhook_data["content"]["customer"]["id"]
    order = webhook_data["content"]["order"]
    process_order(customer_id, order)
elif webhook_data["event_type"] == "payment":
    customer_id = webhook_data["content"]["customer"]["id"]
    order = webhook_data["content"]["payment"]
    process_payment(customer_id, payment)
else:
    process_other(webhook_data)

Could be refactored to use structural pattern matching like this:

match webhook_data:
    case {
        "event_type": "order_created",
        "content": {
            "order": order,
            "customer": {"id": customer_id},
        },
    }:
        process_order(customer_id, order)
    case {
        "event_type": "payment",
        "content": {
            "payment": payment,
            "customer": {"id": customer_id},
        },
    }:
        process_payment(customer_id, payment)
    case _:
        process_other(webhook_data)

Is that clearer? I'm not sure. But it's definitely much more visually-oriented: those case statements kind of like the webhook_data object that we're trying to describe.

Along with tuple unpacking and list comprehensions, match-case results in code that looks like the objects we're describing. Matching a nested dictionary results in code that looks like a nested dictionary. Matching a list or tuple of length N involves writing a list of length N. And in my case, matching an abstract syntax tree involves writing code that looks like an abstract syntax tree.

When writing parsers and matchers, consider match-case

Python's match-case statement is both complex and amazing. I do not recommend using match-case in cases where an if-elif block is simpler (which is most of the time). But, like many complex abstractions, match-case does have its uses.

In particular, using structural pattern matching can make the intent of AST-matching code easier to understand at a glance.

You should consider match-case statements when:

  • You end up in a scary land full of isinstance checking
  • You're matching lists/tuples by their size and contents
  • You're pattern matching against dictionary keys and values

Though in all likelihood, you don't need match-case and your code would likely be simpler without it.

Python's structural pattern matching definitely makes parsing Python code much easier and I'm grateful I thought to use it when creating my undataclass tool.

If you're wondering how that undataclass.py script works, I've written a full explanation of this dataclass converter here.

A Python Tip Every Week

Need to fill-in gaps in your Python skills? I send weekly emails designed to do just that.