This post is primarily for people who have tried out linters like flake8 (not really a linter) or pylint to catch simple bugs and programmatically enforce coding standards but who aren’t perfectly satisfied with the types of issues discovered.
I’ll walk through a high-level overview of how these linters work and some tips on configuring each to more errors. Even if you don’t change your linter set up after reading this, I hope it helps to develop your intuition for what kind of bugs each tool can catch.
Before we start, linters are written to pinpoint lines of code that fail their
checks. As such, the validation errors they report are typically formatted as
$file_name:$line:$column Error Message which I’ve shortened to
Error Message to be clearer on smaller screens.
Flake8 is not itself a linter but is a plugin engine that runs pycodestyle, pyflakes, and mccabe complexity checker. It efficiently reuses a generated AST and hooks into the internals of each linter to parallelize checking across several files.
Flake8 has a library of plugins that can validate method and attribute order or catch stray TODO comments. If there isn’t already a plugin for whatever is grinding your gears, flake8 is designed with plugins in mind and has a great guide on hooking up your plugin once you’ve wrangled the AST.
Pycodestyle is one of the simplest linters and is included within flake8. It validates your code with sub-strings checks and regular expressions, searching mostly for bad style.
$ pycodestyle pycodestyle_test.py 1:10: E401 multiple imports on one line 5:1: E303 too many blank lines (3) 6:1: W191 indentation contains tabs 6:2: E117 over-indented 8:1: E101 indentation contains mixed spaces and tabs 10:4: E714 test for object identity should be 'is not'
Internally, pycodestyle is a collection of checker functions that operate on
“physical lines” or “logical lines”. “Physical lines” are exactly as they
appear in your code but “logical lines” have had their comments stripped and
all strings replaced with text like
xxx to prevent strings and comments from
The two hooks also have slightly different interfaces so if your checker uses
physical_line, it must optionally
return an error but if your checker users
logical_line, it must
yield its errors.
Because pycodestyle has little context outside of individual lines, it’s a good tool for adding a check that ensures code is never indented too far but is too simple for much beyond that.
Though implementing new checkers is is easy, Pycodestyle does not have a way to
register plugins so you’ll have to wrap the
main() in a custom script to add
a new checker.
# mycodestyle.py import pycodestyle @pycodestyle.register_check def too_many_indents(logical_line, indent_level): if indent_level > 20: yield 0, "CUSTOM1 Too much indentation" if __name__ == "__main__": pycodestyle._main()
Now you can validate your files by running
python mycodestyle.py instead
pycodestyle. If you want to select only these errors, you can target them
--select CUSTOM or you can exclude them similarly with
Unfortunately requiring an extra file means that there is no easy way to add a custom pycodestyle check to flake8, which runs pycodestyle as well as pyflakes and mccabe.
My opinion on pycodestyle is that it can feel like a pain and seems to complain
about every conceivable way of formatting multi-line arrays. Pair it with an
black and you’ll avoid most of the nits and be left with
real stylistic improvements like replacing
not x is y with
x is not y.
autopep8 is a tool to automatically fix errors reported by pycodestyle. It
executes pycodestyle (formerly named pep8), and attempts to fix each error that
is reported. It will also attempt to apply some
lib2to3 fixes like importing
reduce from functools.
Pyflakes is an AST based checker that catches issues related to code structure but generally not line-by-line style (though it has some style checks). Like pycodestyle, pyflakes is bundled into flake8.
$ pyflakes pyflakes_test.py 1: 're' imported but unused 5: local variable 'accumulator' is assigned to but never used
pyflakes does not publish a list of errors codes but flake8 lists lists them here
As pyflakes traverses the AST for your code, it keeps track of information like which variables have been used so it can catch subtle bugs like values that are calculated and ignored or imports that are no longer needed.
Pyflakes is not architected to be extended but despite that, it is one of my favorite linters because the errors it catches are so valuable.
autoflake tries to automatically fix some of the issues flagged by pyflakes but should be used with caution:
- it resolves unused variable warnings by deleting the variable assignment, which can mask a value that should be used or can leave around an expensive extra calculation
- it resolves unused import warnings by deleting the import statement which can skip essential side-effects like registering classes
- it resolves
*import warnings by inlining all of the undefined variables into the import statement which at least will make any errors visible quickly
Pylint is a validation engine that lives on top of astroid, PyCQA’s
ast wrapper that supports limited type inference. By default, pylint reports
on a long list of minor style issues like variable names and missing docstrings
and scores code out of 10 points.
I have not found that output useful and I only run pylint to catch errors with
-E flag or by configuring it with a
$ pylint -E pylint_test.py 5:0: E1136: Value 'y' is unsubscriptable (unsubscriptable-object) 7:0: E0102: function already defined line 1 (function-redefined) 16:0: E1101: Instance of 'Person' has no 'float' member (no-member) 20:0: E1126: Sequence index is not an int, slice, or instance with __index__ (invalid-sequence-index) 24:0: E1136: Value 'total' is unsubscriptable (unsubscriptable-object)
Pylint’s brain, astroid, parses your code into an AST and then extrapolates from literal values, python core functions, and class instantiation to figure out possible return values and types within your code. Astroid does not use python type annotations or support any way of annotating code with hints.
Pylint processes this augmented AST to test if each operation is valid on the
inferred values like testing if an instance of a custom class
Person has a
When there are multiple possible values returned by a function, even of the same type, pylint cautiously skips type checking.
def custom_max(a, b): if a > b: return a else: return b def add(a, b): return a + b custom_max(1, 2)['a'] # no error add(1, 2)['a'] # unsubscriptable error raised
In my experience, pylint requires more configuration than other linters. Other linters rarely have false-positives and can be silenced each place an unexpected pattern is used.
Pylint’s inference means false-positives crop up where functions are called or
class instances are used and can require extensive
# pylint: ignore
commenting or customization.
Pylint errors can be resolved in two main ways: either by patching
astroid.brain or by writing pylint extensions that handle your function or
type. For example, if you have a decorator that returns a property, but astoid
does not detect that it is a property, you can customize astroid with
astroid.bases.POSSIBLE_PROPERTIES.add( 'custom_property_decorator' )
If it is not easy to patch the astroid brain to resolve your issue, writing a transform plugins may be your best option. Transforms tell astroid to either alter a node or completely replace a it in the AST.
For example, if you have a class that defines properties with
setattr and you
can’t add a class attribute for each, you can write a transform plugin which
will tell astroid that the attribute is present.
import astroid from astroid import MANAGER def register(linter): # Needed for registering the plugin. pass def transform(cls): if cls.name == 'ClassWithSetAttr': cls.locals['dynamic_attribute'] = [astroid.ClassDef(int, None)] MANAGER.register_transform(astroid.ClassDef, transform)
Or to handle the
custom_max function from before (which had multiple return
values so pylint was not validating any return values) we can tell astroid to
replace each AST node representing a place where the function is called with
# myplugin.py def looks_like_custom_max(node): return isinstance(node.func, astroid.Name) \ and node.func.name == 'custom_max' def transform_custom_max(call, context=None): return iter([astroid.Const(0)]) MANAGER.register_transform( astroid.Call, astroid.inference_tip(transform_custom_max), looks_like_custom_max)
Now when we run
pylint --load-plugin myplugin pylint realizes that the result
of the calculation is not subscriptable.
17:0: E1136: Value 'custom_max(1, 2)' is unsubscriptable (unsubscriptable-object)
Pylint also comes with useful tools like
symilar for finding repeated code and
pyreverse for plotting a dependency diagram of your code.
Despite sometimes being a pain to configure, pylint can be an amazing tool for wrangling under-tested codebases and for enhancing your development environment. It can be a pain to configure but catches types of errors that are fundamentally beyond what any of the previous linters can catch.