Writing Clean Python

In this post we're going to take a break from showing off features of Python, and we're instead going to focus on how to write Python well. The Python language was created with readability in mind, so we really should do everything in our power to exploit that potential. The result is really beautiful code in our applications.

Code is for humans

I think one of the things that is very easy to forget when we're programming is that code is for humans, not the computer. Languages like Python are an abstraction layer so that we as developers don't have to work with the really low level instructions that a computer understands.

When we're writing Python code, it's really not for the computer to read, it's for you, and other developers.

Why then, do many people write code that's hard for humans to understand?

It's kind of like writing a children's story and using words and grammatical structures that might look right at home in academic discourse. There's a clear disconnect between the material and the intended audience.

With this in mind, let's look at how we can write good, human-friendly, and beautiful Python code.

Coding Style

Any discussion about writing good code is eventually going to land on the topic of code style, so let's get this out of the way at the start.

Python has an official style guide which is detailed in a Python Enhancement Proposal (PEP), called PEP8. If you're feeling masochistic, you read through the whole thing, but I want to focus on a few common issues that I see very frequently.

Names

Names are a topic we're going to come back to later, but here I just want to say a few words about the format of names in Python.

While languages like JavaScript heavily favour camel case, which uses capital letters to mark new words in longer names (e.g. myVariableName), Python's style guide stipulates the use of something called snake case,

Snake case names are entirely lowercase, and multi-word names are broken up using underscores. For example, my_variable_name. This pattern is used for variables, functions, methods, etc.

There are two major exceptions to this naming pattern: classes, and constants.

Classes make use of something called Pascal case, which is just the same as camel case, except we also use an initial uppercase letter. An example might be MyClassName. This initial capital letter is a good indicator in Python that something is a class.

Constants make use of capital letters, and words are broken up using underscores. It's basically just capital letter snake case: MY_CONSTANT.

It's a good idea to follow these naming patterns in your code, because seasoned Python developers are able to tell a lot about what a variable is, based on how it was named. Misusing these naming styles might make it harder to understand how your code works at a glance.

Indentation

Indentation in Python is very important, because it's how we define blocks of related code. Many other languages would use something like curly braces for this purpose, with indentation being more of an optional visual aid.

Because it has semantic significance, we need to pay extra attention to our indentation in Python, first so that we don't run into bugs, but also so that other developers can understand the hierarchy of our code very easily.

Python recommends four spaces for each level of indentation. Ultimately, the amount of spaces doesn't really matter: it's more important that you're consistent.

However, it is worth keeping in mind that very small and very large indentation sizes have some downsides. One or two space indentation can be hard for people to parse, especially for developers with conditions like dyslexia. Very wide indentation blocks, such as eight spaces, are obviously much easier to spot, but then line length becomes a very limiting factor.

I would recommend sticking to the style guide on this one, but just make sure you're consistent.

Empty lines and other white space

One of the big issues I see in our students' code is that a lot of newer developers in particular are very concerned with writing short code. They don't want to add any empty lines because it makes their code "longer", and they avoid putting spaces within lines of code wherever they're not required.

Don't be afraid to put spaces in your code. Think about if you were writing in English. We break up large blocks of text with paragraphs, so that it's easier to read. We can do the same in code. If we have a block of related functionality, we can make that very clear by putting empty lines around it to make it a single visual unit.

I like to put spaces around things like for loops, for example, because they're generally pretty self-contained.

When it comes to spaces within lines of code, as a general rule, you want to be putting spaces around all binary operators, including the assignment operator, as well as after commas and colons.

Consider the following two lists:

good = [1243.45, 344535.3, 4465.0043, 4775.393, 565.3]
bad = [1243.45,344535.3,4465.0043,4775.393,565.3]

I think the example here speaks for itself. It's clearly much easier to read the first list, and to determine information about its contents, than it is with the second list.

Another a good suggestion in the style guide is to put two empty lines after function and class definitions. Since these blocks often include empty lines, the double empty line makes it very easy to spot where the class or function definition ends.

Keep lines short

Coming back to the idea of newer developers often trying to write very short code, a common side effect is very long lines. This is bad for several reasons.

The first is entirely down to hardware limitations. Screens are only so wide, so if you go to real extremes to keep code on a single line, you inevitably introduce horizontal scrolling. This means that a reader of your code can't read all of the logic in one go, and they have to flick backwards and forwards to parse your code. Of course you can enable text wrapping, but then why not just put the code on two lines? At least then you have a logical break point, rather than the code being split based on how big the reader's monitor happens to be.

A second issue is down to limitations in us as humans. We find it very difficult to read long lines of text quickly, and there's a few good reasons for that.

Very fast readers work in several word chunks, or even whole lines, but our peripheral vision is only so good. As line length increases, the amount of eye movement required to read the lines goes up, and this slows down reading speed.

It's also very common for people to lose which line they're reading, the more the eye has to move. This leads to a lot of re-reading of the same material.

There's a very good reason that larger print formats such as magazines and newspapers break up text into several columns: it's much easier for us to read the information that way. This is also why many websites have very large gutters on either side of the site content, because the main content it limited to a comfortable reading width.

The final major issue with long lines is code complexity. The lines are not longer for no reason: more things have been put on these lines which is what made them longer. This invariably leads to more dense logic, which again, is much harder for human readers.

Complicated logic very quickly becomes difficult for us to reason about, so it's very important that we resist the temptation to chain many operations together in our code. If it takes five more lines to write in a very clear and understandable way, it's worth it.

Automatic formatting

Even the best of us sometimes diverge from good practices now and then, so it's a good idea to make use of tools to automatically format our code after the fact.

One tool you might want to check out is Black. It uses a somewhat liberal interpretation of PEP8, particularly when it comes to things like line length. PEP8 stipulates a very strict maximum line length, but this can sometimes lead to less readable code, especially when we're splitting code over multiple lines because we went one character over the limit.

A final word on PEP8

As I just mentioned, there are times when strictly adhering to PEP8 isn't always the best thing. It's a good rule of thumb for writing code in good style, but it's also useful to know when not to follow PEP8.

Don't treat PEP8 like the 10 Commandments, but make sure you know why you're breaking the rules before you do so.

Using good names

I often say to students on our courses that variables names are basically just built in documentation; it's a total waste not to exploit that. Variable names like x or list_1 are throwing away a golden opportunity to describe the values associated with those variables.

Choosing good names is hard, and it's definitely a skill you need to develop, but it's worth putting in the effort to learn.

Take the following example:

list_1 = ["John", "Rolf", "Sarah", "Anne"]

for e in list_1:
    print(e)

This is quite simple code, but even here we can make it far more obvious what the for loop is doing by just using good names:

friends = ["John", "Rolf", "Sarah", "Anne"]

for friend in friends:
    print(friend)

An even more explicit version might look like this:

friends = ["John", "Rolf", "Sarah", "Anne"]

for friend_name in friends:
    print(friend_name)

Now imagine what a difference this makes when the logic is actually complicated.

One final thing to keep in mind with names is the use of the singular and plural. A name like number indicates to a reader that this is probably a single value, while something like numbers indicates that something is a collection.

In general, try to reserve plural names for collections, but make sure you do use them when appropriate.

Pythonic Loops

We've already seen an example of a good for loop above, but many people write their Python loops in very different way. It's common to see a pattern like this:

friends = ["John", "Rolf", "Sarah", "Anne"]

for i in range(len(friends)):
    print(friends[i])

The code above is absolutely identical in function to this:

friends = ["John", "Rolf", "Sarah", "Anne"]

for friend_name in friends:
    print(friend_name)

The first example comes from a misunderstanding regarding how Python's for loop works. Python uses a "for each" style loop, where we get direct access to the elements within an iterable. However, many languages use a more traditional for loop, where you create a counter and some stopping condition, and you might use this to access items in a collection by index.

The benefit of the "for each" loop is that we get to use good, descriptive names, and we don't have to faff around with indices. As a general rule, if you're referring to indices in something like a loop in Python, there's probably a better way to do it.

So, what are some of those better ways?

Using `zip`

One common place where I see indices used in Python is when somebody is trying to work with two or more iterables at the same time. For example, maybe we have a list of names, and we have a list of corresponding ages, and we want to print out both using the same loop.

A naïve approach might look like this:

names = ["John", "Rolf", "Sarah", "Anne"]
ages = [27, 42, 31, 29]

for i in range(len(names)):
    print(f"{names[i]} is {ages[i]} years old.")

To be clear, this code works perfectly. However, I think if the logic was more complicated, this type of approach would quickly get very difficult to read.

We can make this code a great deal cleaner using the zip function. We have a blog post on this function already, so please check that out to learn more. You can also check out the documentation.

Using zip the above code now looks like this:

names = ["John", "Rolf", "Sarah", "Anne"]
ages = [27, 42, 31, 29]

for name, age in zip(names, ages):
    print(f"{name} is {age} years old.")

Using `enumerate`

A less common, but still fairly frequent use of indices in loops, is when somebody needs some kind of a counter, and they think they can kill two birds with one stone by generating this counter first, and using it as an index as well.

Something like this, perhaps:

names = ["John", "Rolf", "Sarah", "Anne"]

for i in range(len(names)):
    print(f"Person {i + 1} is called {names[i]}")

While this code is perfectly functional, we can make things cleaner by using the enumerate function. Again, we have a post on enumerate already, so you should check that out for more information. You can also look in the documentation.

Using enumerate the code above looks like this:

names = ["John", "Rolf", "Sarah", "Anne"]

for counter, name in enumerate(names, start=1):
    print(f"Person {counter} is called {name}")

Destructuring

Both of the examples above made use of a technique called destructuring, which we also have a blog post about, but I want to talk about another situation where destructuring can be very useful.

In our Complete Python Course, we have an exercise in which students create a quiz system, where they have a file containing questions and answers. They need to split each line of the file into questions and answers, where each line looks something like this: 3+8=11.

A lot of students go for something along these lines:

for line in question_file:
    question = line.split("=")[0]
    answer = line.split("=")[1]

However, this is another perfect opportunity to do away with indices, and destructuring is our weapon of choice:

for line in question_file:
    question, answer = line.split("=")

This technique can be used any time we're working with some ordered collection, and we want to split it into its component parts. It's well worth getting familiar with, and it will make your code a great deal cleaner.

Idiomatic conditionals

I feel like conditionals in Python are something of a shibboleth which can quickly identify you as somebody who maybe generally works in another language.

It's quite common to see code like this:

if len(some_list) != 0:
    ... do something

Or:

if some_string != "":
    ... do something

This is certainly very explicit code, and I don't think there's anything wrong with it, per se, but it's not very idiomatic: it's not how most experienced Python developers would write something like this. Instead, it's very common to rely on the truth values of certain types.

In Python, very few things evaluate to False, but empty collections are one of them. This means strings, sets, tuples, dictionaries, lists, etc.

As such, instead of writing the examples above, we could write this:

if some_list:
    ... do something

if some_string:
    ... do something

In both cases, these conditions just mean, "if the collection isn't empty".

Multiple return values

We can go one step further with this trick when we're using the conditions to control which return statement is triggered in a function or method.

As an example, we might have a function which asked the user for their name, and then returns either the name, or "John Doe" if no name was entered by the user. Using what we learnt above, we might write something like this:

def get_name():
    name = input("Please enter your name: ")

    if name:
        return name
    
    return "John Doe"

However, we can make use of the Boolean or operator to make this a great deal shorter.

def get_name():
    name = input("Please enter your name: ")

    return name or "John Doe"

The or operator in Python yields the first operand if it evaluates to True, which in our case means the name string isn't empty, otherwise it yields the second operand. As such, in any case where the user entered a name, name is returned by the get_name function; otherwise, it returns "John Doe".

This trick also works for assignments as well!

Once again, we have a blog post covering Boolean operators where we talk about this useful property of or.

Comprehensions

The last thing I want to talk about in this post is using comprehensions. Comprehensions are used to create a new collections from another iterable, and they're an alternative to the more verbose for loop syntax in those instances. Personally, I absolutely love them!

As an example, maybe we have a list of lowercase names, and we want a new list where they're all in title case instead, with an initial capital letter. The for loop version would look like this:

names = ["rolf", "james", "jose", "sarah", "lucy"]
title_names = []

for name in names:
    title_names.append(name.title())

One limitation of this method is that we can't reuse the names variable name, even if we no longer plan to use the values. It's also got a fair bit of boilerplate code.

A comprehension version would look like this:

names = ["rolf", "james", "jose", "sarah", "lucy"]
names = [name.title() for name in names]

Or, if we wanted to preserve the original list:

names = ["rolf", "james", "jose", "sarah", "lucy"]
title_names = [name.title() for name in names]

We have a couple of posts on list comprehensions. The first covers basic list comprehensions like we see above, and the second covers list comprehensions with conditionals.

Python also has other types of comprehensions, such as set comprehensions and dictionary comprehension which can be used in a very similar way. I've spoken about these in another post.

Comprehensions are a great tool to have in your repertoire, so definitely get familiar with them.

Wrapping up

This was quite a long one, so thanks for sticking with me to the end! I hope you learnt some new tricks to help you clean up your Python code.

We cover a lot of material like this on our Complete Python Course, so if you're interested in taking your Python to the next level, check it out.

We also have a community on Discord where you can get help and advice, as well as a mailing list so you can keep up to date with all our content. We also send discount codes to our subscribers, so make sure to sign up below to get the best deals on our courses.

Writing Clean Python

Code is for humans

Coding Style

Names

Indentation

Empty lines and other white space

Keep lines short

Automatic formatting

A final word on PEP8

Using good names

Pythonic Loops

Using `zip`

Using `enumerate`

Destructuring

Idiomatic conditionals

Multiple return values

Comprehensions

Wrapping up

Assignment expressions in Python

Python's namedtuples

Code is for humans

Coding Style

Names

Indentation

Empty lines and other white space

Keep lines short

Automatic formatting

A final word on PEP8

Using good names

Pythonic Loops

Using zip

Using enumerate

Destructuring

Idiomatic conditionals

Multiple return values

Comprehensions

Wrapping up

Using `zip`

Using `enumerate`