Python Linters, Fixers, and Other Static Checkers

This post is primarily for people who have tried out linters like flake8 (not really a linter) or pylint to catch simple bugs and programmatically enforce coding standards but who aren’t perfectly satisfied with the types of issues discovered.

I’ll walk through a high-level overview of how these linters work and some tips on configuring each to more errors. Even if you don’t change your linter set up after reading this, I hope it helps to develop your intuition for what kind of bugs each tool can catch.


Before we start, linters are written to pinpoint lines of code that fail their checks. As such, the validation errors they report are typically formatted as $file_name:$line:$column Error Message which I’ve shortened to $line:$column Error Message to be clearer on smaller screens.

Flake8

Flake8 is not itself a linter but is a plugin engine that runs pycodestyle, pyflakes, and mccabe complexity checker. It efficiently reuses a generated AST and hooks into the internals of each linter to parallelize checking across several files.

Flake8 has a library of plugins that can validate method and attribute order or catch stray TODO comments. If there isn’t already a plugin for whatever is grinding your gears, flake8 is designed with plugins in mind and has a great guide on hooking up your plugin once you’ve wrangled the AST.

Pycodestyle

Pycodestyle is one of the simplest linters and is included within flake8. It validates your code with sub-strings checks and regular expressions, searching mostly for bad style.

$ pycodestyle pycodestyle_test.py
1:10: E401 multiple imports on one line
5:1: E303 too many blank lines (3)
6:1: W191 indentation contains tabs
6:2: E117 over-indented
8:1: E101 indentation contains mixed spaces and tabs
10:4: E714 test for object identity should be 'is not'

see all errors

Internally, pycodestyle is a collection of checker functions that operate on “physical lines” or “logical lines”. “Physical lines” are exactly as they appear in your code but “logical lines” have had their comments stripped and all strings replaced with text like xxx to prevent strings and comments from being checked.

The two hooks also have slightly different interfaces so if your checker uses physical_line, it must optionally return an error but if your checker users logical_line, it must yield its errors.

Customization

Because pycodestyle has little context outside of individual lines, it’s a good tool for adding a check that ensures code is never indented too far but is too simple for much beyond that.

Though implementing new checkers is is easy, Pycodestyle does not have a way to register plugins so you’ll have to wrap the main() in a custom script to add a new checker.

# mycodestyle.py
import pycodestyle


@pycodestyle.register_check
def too_many_indents(logical_line, indent_level):
    if indent_level > 20:
        yield 0, "CUSTOM1 Too much indentation"


if __name__ == "__main__":
    pycodestyle._main()

Now you can validate your files by running python mycodestyle.py instead of pycodestyle. If you want to select only these errors, you can target them with --select CUSTOM or you can exclude them similarly with --ignore CUSTOM.

Unfortunately requiring an extra file means that there is no easy way to add a custom pycodestyle check to flake8, which runs pycodestyle as well as pyflakes and mccabe.

My Opinion

My opinion on pycodestyle is that it can feel like a pain and seems to complain about every conceivable way of formatting multi-line arrays. Pair it with an autoformatter like black and you’ll avoid most of the nits and be left with real stylistic improvements like replacing not x is y with x is not y.

autopep8 is a tool to automatically fix errors reported by pycodestyle. It executes pycodestyle (formerly named pep8), and attempts to fix each error that is reported. It will also attempt to apply some lib2to3 fixes like importing reduce from functools.

Pyflakes

Pyflakes is an AST based checker that catches issues related to code structure but generally not line-by-line style (though it has some style checks). Like pycodestyle, pyflakes is bundled into flake8.

$ pyflakes pyflakes_test.py
1: 're' imported but unused
5: local variable 'accumulator' is assigned to but never used

pyflakes does not publish a list of errors codes but flake8 lists lists them here

As pyflakes traverses the AST for your code, it keeps track of information like which variables have been used so it can catch subtle bugs like values that are calculated and ignored or imports that are no longer needed.

Pyflakes is not architected to be extended but despite that, it is one of my favorite linters because the errors it catches are so valuable.

autoflake tries to automatically fix some of the issues flagged by pyflakes but should be used with caution:

  • it resolves unused variable warnings by deleting the variable assignment, which can mask a value that should be used or can leave around an expensive extra calculation
  • it resolves unused import warnings by deleting the import statement which can skip essential side-effects like registering classes
  • it resolves * import warnings by inlining all of the undefined variables into the import statement which at least will make any errors visible quickly

Pylint

Pylint is a validation engine that lives on top of astroid, PyCQA’s ast wrapper that supports limited type inference. By default, pylint reports on a long list of minor style issues like variable names and missing docstrings and scores code out of 10 points.

I have not found that output useful and I only run pylint to catch errors with the -E flag or by configuring it with a .pylintrc file.

$ pylint -E pylint_test.py
5:0: E1136: Value 'y' is unsubscriptable (unsubscriptable-object)
7:0: E0102: function already defined line 1 (function-redefined)
16:0: E1101: Instance of 'Person' has no 'float' member (no-member)
20:0: E1126: Sequence index is not an int, slice, or instance with __index__ (invalid-sequence-index)
24:0: E1136: Value 'total' is unsubscriptable (unsubscriptable-object)

Pylint’s brain, astroid, parses your code into an AST and then extrapolates from literal values, python core functions, and class instantiation to figure out possible return values and types within your code. Astroid does not use python type annotations or support any way of annotating code with hints.

Pylint processes this augmented AST to test if each operation is valid on the inferred values like testing if an instance of a custom class Person has a method bark.

When there are multiple possible values returned by a function, even of the same type, pylint cautiously skips type checking.

def custom_max(a, b):
    if a > b:
        return a
    else:
        return b


def add(a, b):
    return a + b


custom_max(1, 2)['a']  # no error
add(1, 2)['a']  # unsubscriptable error raised

Customization

In my experience, pylint requires more configuration than other linters. Other linters rarely have false-positives and can be silenced each place an unexpected pattern is used.

Pylint’s inference means false-positives crop up where functions are called or class instances are used and can require extensive # pylint: ignore commenting or customization.

Pylint errors can be resolved in two main ways: either by patching astroid.brain or by writing pylint extensions that handle your function or type. For example, if you have a decorator that returns a property, but astoid does not detect that it is a property, you can customize astroid with

astroid.bases.POSSIBLE_PROPERTIES.add(
  'custom_property_decorator'
)

If it is not easy to patch the astroid brain to resolve your issue, writing a transform plugins may be your best option. Transforms tell astroid to either alter a node or completely replace a it in the AST.

For example, if you have a class that defines properties with setattr and you can’t add a class attribute for each, you can write a transform plugin which will tell astroid that the attribute is present.

import astroid
from astroid import MANAGER


def register(linter):
    # Needed for registering the plugin.
    pass


def transform(cls):
    if cls.name == 'ClassWithSetAttr':
        cls.locals['dynamic_attribute'] = [astroid.ClassDef(int, None)]


MANAGER.register_transform(astroid.ClassDef, transform)

Or to handle the custom_max function from before (which had multiple return values so pylint was not validating any return values) we can tell astroid to replace each AST node representing a place where the function is called with the literal 0.

# myplugin.py

def looks_like_custom_max(node):
    return isinstance(node.func, astroid.Name) \
        and node.func.name == 'custom_max'

def transform_custom_max(call, context=None):
    return iter([astroid.Const(0)])


MANAGER.register_transform(
    astroid.Call,
    astroid.inference_tip(transform_custom_max),
    looks_like_custom_max)

Now when we run pylint --load-plugin myplugin pylint realizes that the result of the calculation is not subscriptable.

17:0: E1136: Value 'custom_max(1, 2)' is unsubscriptable (unsubscriptable-object)

Pylint also comes with useful tools like symilar for finding repeated code and pyreverse for plotting a dependency diagram of your code.

My Opinion

Despite sometimes being a pain to configure, pylint can be an amazing tool for wrangling under-tested codebases and for enhancing your development environment. It can be a pain to configure but catches types of errors that are fundamentally beyond what any of the previous linters can catch.

Updated: