Python Linters, Fixers, and Other Static Checkers
This post is primarily for people who have tried out linters like flake8 (not really a linter) or pylint to catch simple bugs and programmatically enforce coding standards but who aren’t perfectly satisfied with the types of issues discovered.
I’ll walk through a high-level overview of how these linters work and some tips on configuring each to more errors. Even if you don’t change your linter set up after reading this, I hope it helps to develop your intuition for what kind of bugs each tool can catch.
Before we start, linters are written to pinpoint lines of code that fail their
checks. As such, the validation errors they report are typically formatted as
$file_name:$line:$column Error Message
which I’ve shortened to $line:$column
Error Message
to be clearer on smaller screens.
Flake8
Flake8 is not itself a linter but is a plugin engine that runs pycodestyle, pyflakes, and mccabe complexity checker. It efficiently reuses a generated AST and hooks into the internals of each linter to parallelize checking across several files.
Flake8 has a library of plugins that can validate method and attribute order or catch stray TODO comments. If there isn’t already a plugin for whatever is grinding your gears, flake8 is designed with plugins in mind and has a great guide on hooking up your plugin once you’ve wrangled the AST.
Pycodestyle
Pycodestyle is one of the simplest linters and is included within flake8. It validates your code with sub-strings checks and regular expressions, searching mostly for bad style.
$ pycodestyle pycodestyle_test.py
1:10: E401 multiple imports on one line
5:1: E303 too many blank lines (3)
6:1: W191 indentation contains tabs
6:2: E117 over-indented
8:1: E101 indentation contains mixed spaces and tabs
10:4: E714 test for object identity should be 'is not'
Internally, pycodestyle is a collection of checker functions that operate on
“physical lines” or “logical lines”. “Physical lines” are exactly as they
appear in your code but “logical lines” have had their comments stripped and
all strings replaced with text like xxx
to prevent strings and comments from
being checked.
The two hooks also have slightly different interfaces so if your checker uses
physical_line
, it must optionally return
an error but if your checker users
logical_line
, it must yield
its errors.
Customization
Because pycodestyle has little context outside of individual lines, it’s a good tool for adding a check that ensures code is never indented too far but is too simple for much beyond that.
Though implementing new checkers is is easy, Pycodestyle does not have a way to
register plugins so you’ll have to wrap the main()
in a custom script to add
a new checker.
# mycodestyle.py
import pycodestyle
@pycodestyle.register_check
def too_many_indents(logical_line, indent_level):
if indent_level > 20:
yield 0, "CUSTOM1 Too much indentation"
if __name__ == "__main__":
pycodestyle._main()
Now you can validate your files by running python mycodestyle.py
instead
of pycodestyle
. If you want to select only these errors, you can target them
with --select CUSTOM
or you can exclude them similarly with --ignore CUSTOM
.
Unfortunately requiring an extra file means that there is no easy way to add a custom pycodestyle check to flake8, which runs pycodestyle as well as pyflakes and mccabe.
My Opinion
My opinion on pycodestyle is that it can feel like a pain and seems to complain
about every conceivable way of formatting multi-line arrays. Pair it with an
autoformatter like black
and you’ll avoid most of the nits and be left with
real stylistic improvements like replacing not x is y
with x is not y
.
Related Tools
autopep8 is a tool to automatically fix errors reported by pycodestyle. It
executes pycodestyle (formerly named pep8), and attempts to fix each error that
is reported. It will also attempt to apply some lib2to3
fixes like importing
reduce from functools.
Pyflakes
Pyflakes is an AST based checker that catches issues related to code structure but generally not line-by-line style (though it has some style checks). Like pycodestyle, pyflakes is bundled into flake8.
$ pyflakes pyflakes_test.py
1: 're' imported but unused
5: local variable 'accumulator' is assigned to but never used
pyflakes does not publish a list of errors codes but flake8 lists lists them here
As pyflakes traverses the AST for your code, it keeps track of information like which variables have been used so it can catch subtle bugs like values that are calculated and ignored or imports that are no longer needed.
Pyflakes is not architected to be extended but despite that, it is one of my favorite linters because the errors it catches are so valuable.
Related Tools
autoflake tries to automatically fix some of the issues flagged by pyflakes but should be used with caution:
- it resolves unused variable warnings by deleting the variable assignment, which can mask a value that should be used or can leave around an expensive extra calculation
- it resolves unused import warnings by deleting the import statement which can skip essential side-effects like registering classes
- it resolves
*
import warnings by inlining all of the undefined variables into the import statement which at least will make any errors visible quickly
Pylint
Pylint is a validation engine that lives on top of astroid, PyCQA’s
ast
wrapper that supports limited type inference. By default, pylint reports
on a long list of minor style issues like variable names and missing docstrings
and scores code out of 10 points.
I have not found that output useful and I only run pylint to catch errors with
the -E
flag or by configuring it with a .pylintrc
file.
$ pylint -E pylint_test.py
5:0: E1136: Value 'y' is unsubscriptable (unsubscriptable-object)
7:0: E0102: function already defined line 1 (function-redefined)
16:0: E1101: Instance of 'Person' has no 'float' member (no-member)
20:0: E1126: Sequence index is not an int, slice, or instance with __index__ (invalid-sequence-index)
24:0: E1136: Value 'total' is unsubscriptable (unsubscriptable-object)
Pylint’s brain, astroid, parses your code into an AST and then extrapolates from literal values, python core functions, and class instantiation to figure out possible return values and types within your code. Astroid does not use python type annotations or support any way of annotating code with hints.
Pylint processes this augmented AST to test if each operation is valid on the
inferred values like testing if an instance of a custom class Person
has a
method bark
.
When there are multiple possible values returned by a function, even of the same type, pylint cautiously skips type checking.
def custom_max(a, b):
if a > b:
return a
else:
return b
def add(a, b):
return a + b
custom_max(1, 2)['a'] # no error
add(1, 2)['a'] # unsubscriptable error raised
Customization
In my experience, pylint requires more configuration than other linters. Other linters rarely have false-positives and can be silenced each place an unexpected pattern is used.
Pylint’s inference means false-positives crop up where functions are called or
class instances are used and can require extensive # pylint: ignore
commenting or customization.
Pylint errors can be resolved in two main ways: either by patching
astroid.brain
or by writing pylint extensions that handle your function or
type. For example, if you have a decorator that returns a property, but astoid
does not detect that it is a property, you can customize astroid with
astroid.bases.POSSIBLE_PROPERTIES.add(
'custom_property_decorator'
)
If it is not easy to patch the astroid brain to resolve your issue, writing a transform plugins may be your best option. Transforms tell astroid to either alter a node or completely replace a it in the AST.
For example, if you have a class that defines properties with setattr
and you
can’t add a class attribute for each, you can write a transform plugin which
will tell astroid that the attribute is present.
import astroid
from astroid import MANAGER
def register(linter):
# Needed for registering the plugin.
pass
def transform(cls):
if cls.name == 'ClassWithSetAttr':
cls.locals['dynamic_attribute'] = [astroid.ClassDef(int, None)]
MANAGER.register_transform(astroid.ClassDef, transform)
Or to handle the custom_max
function from before (which had multiple return
values so pylint was not validating any return values) we can tell astroid to
replace each AST node representing a place where the function is called with
the literal 0
.
# myplugin.py
def looks_like_custom_max(node):
return isinstance(node.func, astroid.Name) \
and node.func.name == 'custom_max'
def transform_custom_max(call, context=None):
return iter([astroid.Const(0)])
MANAGER.register_transform(
astroid.Call,
astroid.inference_tip(transform_custom_max),
looks_like_custom_max)
Now when we run pylint --load-plugin myplugin
pylint realizes that the result
of the calculation is not subscriptable.
17:0: E1136: Value 'custom_max(1, 2)' is unsubscriptable (unsubscriptable-object)
Pylint also comes with useful tools like symilar
for finding repeated code and
pyreverse
for plotting a dependency diagram of your code.
My Opinion
Despite sometimes being a pain to configure, pylint can be an amazing tool for wrangling under-tested codebases and for enhancing your development environment. It can be a pain to configure but catches types of errors that are fundamentally beyond what any of the previous linters can catch.