2007-08-15 11:28:01 -03:00
|
|
|
************************************
|
2009-01-03 16:55:06 -04:00
|
|
|
Idioms and Anti-Idioms in Python
|
2007-08-15 11:28:01 -03:00
|
|
|
************************************
|
|
|
|
|
|
|
|
:Author: Moshe Zadka
|
|
|
|
|
2008-04-18 13:53:09 -03:00
|
|
|
This document is placed in the public domain.
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
|
|
|
|
.. topic:: Abstract
|
|
|
|
|
|
|
|
This document can be considered a companion to the tutorial. It shows how to use
|
|
|
|
Python, and even more importantly, how *not* to use Python.
|
|
|
|
|
|
|
|
|
|
|
|
Language Constructs You Should Not Use
|
|
|
|
======================================
|
|
|
|
|
|
|
|
While Python has relatively few gotchas compared to other languages, it still
|
|
|
|
has some constructs which are only useful in corner cases, or are plain
|
|
|
|
dangerous.
|
|
|
|
|
|
|
|
|
|
|
|
from module import \*
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
|
|
|
|
Inside Function Definitions
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
``from module import *`` is *invalid* inside function definitions. While many
|
|
|
|
versions of Python do not check for the invalidity, it does not make it more
|
2009-05-22 07:40:00 -03:00
|
|
|
valid, no more than having a smart lawyer makes a man innocent. Do not use it
|
2007-08-15 11:28:01 -03:00
|
|
|
like that ever. Even in versions where it was accepted, it made the function
|
2011-08-18 10:36:15 -03:00
|
|
|
execution slower, because the compiler could not be certain which names were
|
|
|
|
local and which were global. In Python 2.1 this construct causes warnings, and
|
2007-08-15 11:28:01 -03:00
|
|
|
sometimes even errors.
|
|
|
|
|
|
|
|
|
|
|
|
At Module Level
|
|
|
|
^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
While it is valid to use ``from module import *`` at module level it is usually
|
|
|
|
a bad idea. For one, this loses an important property Python otherwise has ---
|
|
|
|
you can know where each toplevel name is defined by a simple "search" function
|
|
|
|
in your favourite editor. You also open yourself to trouble in the future, if
|
|
|
|
some module grows additional functions or classes.
|
|
|
|
|
2011-08-18 10:36:15 -03:00
|
|
|
One of the most awful questions asked on the newsgroup is why this code::
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
f = open("www")
|
|
|
|
f.read()
|
|
|
|
|
|
|
|
does not work. Of course, it works just fine (assuming you have a file called
|
2010-02-06 14:44:44 -04:00
|
|
|
"www".) But it does not work if somewhere in the module, the statement ``from
|
|
|
|
os import *`` is present. The :mod:`os` module has a function called
|
|
|
|
:func:`open` which returns an integer. While it is very useful, shadowing a
|
|
|
|
builtin is one of its least useful properties.
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
Remember, you can never know for sure what names a module exports, so either
|
|
|
|
take what you need --- ``from module import name1, name2``, or keep them in the
|
|
|
|
module and access on a per-need basis --- ``import module;print module.name``.
|
|
|
|
|
|
|
|
|
|
|
|
When It Is Just Fine
|
|
|
|
^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
There are situations in which ``from module import *`` is just fine:
|
|
|
|
|
|
|
|
* The interactive prompt. For example, ``from math import *`` makes Python an
|
|
|
|
amazing scientific calculator.
|
|
|
|
|
|
|
|
* When extending a module in C with a module in Python.
|
|
|
|
|
|
|
|
* When the module advertises itself as ``from import *`` safe.
|
|
|
|
|
|
|
|
|
|
|
|
Unadorned :keyword:`exec`, :func:`execfile` and friends
|
|
|
|
-------------------------------------------------------
|
|
|
|
|
|
|
|
The word "unadorned" refers to the use without an explicit dictionary, in which
|
|
|
|
case those constructs evaluate code in the *current* environment. This is
|
|
|
|
dangerous for the same reasons ``from import *`` is dangerous --- it might step
|
|
|
|
over variables you are counting on and mess up things for the rest of your code.
|
|
|
|
Simply do not do that.
|
|
|
|
|
|
|
|
Bad examples::
|
|
|
|
|
|
|
|
>>> for name in sys.argv[1:]:
|
|
|
|
>>> exec "%s=1" % name
|
|
|
|
>>> def func(s, **kw):
|
|
|
|
>>> for var, val in kw.items():
|
|
|
|
>>> exec "s.%s=val" % var # invalid!
|
|
|
|
>>> execfile("handler.py")
|
|
|
|
>>> handle()
|
|
|
|
|
|
|
|
Good examples::
|
|
|
|
|
|
|
|
>>> d = {}
|
|
|
|
>>> for name in sys.argv[1:]:
|
|
|
|
>>> d[name] = 1
|
|
|
|
>>> def func(s, **kw):
|
|
|
|
>>> for var, val in kw.items():
|
|
|
|
>>> setattr(s, var, val)
|
|
|
|
>>> d={}
|
|
|
|
>>> execfile("handle.py", d, d)
|
|
|
|
>>> handle = d['handle']
|
|
|
|
>>> handle()
|
|
|
|
|
|
|
|
|
|
|
|
from module import name1, name2
|
|
|
|
-------------------------------
|
|
|
|
|
2009-05-22 07:40:00 -03:00
|
|
|
This is a "don't" which is much weaker than the previous "don't"s but is still
|
2007-08-15 11:28:01 -03:00
|
|
|
something you should not do if you don't have good reasons to do that. The
|
2011-07-18 05:39:55 -03:00
|
|
|
reason it is usually a bad idea is because you suddenly have an object which lives
|
2008-02-22 08:31:45 -04:00
|
|
|
in two separate namespaces. When the binding in one namespace changes, the
|
2007-08-15 11:28:01 -03:00
|
|
|
binding in the other will not, so there will be a discrepancy between them. This
|
|
|
|
happens when, for example, one module is reloaded, or changes the definition of
|
|
|
|
a function at runtime.
|
|
|
|
|
|
|
|
Bad example::
|
|
|
|
|
|
|
|
# foo.py
|
|
|
|
a = 1
|
|
|
|
|
|
|
|
# bar.py
|
|
|
|
from foo import a
|
|
|
|
if something():
|
2009-01-03 16:55:06 -04:00
|
|
|
a = 2 # danger: foo.a != a
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
Good example::
|
|
|
|
|
|
|
|
# foo.py
|
|
|
|
a = 1
|
|
|
|
|
|
|
|
# bar.py
|
|
|
|
import foo
|
|
|
|
if something():
|
|
|
|
foo.a = 2
|
|
|
|
|
|
|
|
|
|
|
|
except:
|
|
|
|
-------
|
|
|
|
|
|
|
|
Python has the ``except:`` clause, which catches all exceptions. Since *every*
|
2010-09-11 15:57:20 -03:00
|
|
|
error in Python raises an exception, using ``except:`` can make many
|
|
|
|
programming errors look like runtime problems, which hinders the debugging
|
|
|
|
process.
|
2007-08-15 11:28:01 -03:00
|
|
|
|
2010-09-11 15:57:20 -03:00
|
|
|
The following code shows a great example of why this is bad::
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
try:
|
|
|
|
foo = opne("file") # misspelled "open"
|
|
|
|
except:
|
|
|
|
sys.exit("could not open file!")
|
|
|
|
|
2010-09-11 15:57:20 -03:00
|
|
|
The second line triggers a :exc:`NameError`, which is caught by the except
|
|
|
|
clause. The program will exit, and the error message the program prints will
|
|
|
|
make you think the problem is the readability of ``"file"`` when in fact
|
|
|
|
the real error has nothing to do with ``"file"``.
|
2007-08-15 11:28:01 -03:00
|
|
|
|
2010-09-11 15:57:20 -03:00
|
|
|
A better way to write the above is ::
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
try:
|
2010-09-11 15:57:20 -03:00
|
|
|
foo = opne("file")
|
2007-08-15 11:28:01 -03:00
|
|
|
except IOError:
|
|
|
|
sys.exit("could not open file")
|
|
|
|
|
2010-09-11 15:57:20 -03:00
|
|
|
When this is run, Python will produce a traceback showing the :exc:`NameError`,
|
|
|
|
and it will be immediately apparent what needs to be fixed.
|
|
|
|
|
|
|
|
.. index:: bare except, except; bare
|
|
|
|
|
|
|
|
Because ``except:`` catches *all* exceptions, including :exc:`SystemExit`,
|
|
|
|
:exc:`KeyboardInterrupt`, and :exc:`GeneratorExit` (which is not an error and
|
|
|
|
should not normally be caught by user code), using a bare ``except:`` is almost
|
|
|
|
never a good idea. In situations where you need to catch all "normal" errors,
|
|
|
|
such as in a framework that runs callbacks, you can catch the base class for
|
2010-09-11 16:15:40 -03:00
|
|
|
all normal exceptions, :exc:`Exception`. Unfortunately in Python 2.x it is
|
2010-09-11 15:57:20 -03:00
|
|
|
possible for third-party code to raise exceptions that do not inherit from
|
|
|
|
:exc:`Exception`, so in Python 2.x there are some cases where you may have to
|
|
|
|
use a bare ``except:`` and manually re-raise the exceptions you don't want
|
|
|
|
to catch.
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
|
|
|
|
Exceptions
|
|
|
|
==========
|
|
|
|
|
|
|
|
Exceptions are a useful feature of Python. You should learn to raise them
|
|
|
|
whenever something unexpected occurs, and catch them only where you can do
|
|
|
|
something about them.
|
|
|
|
|
|
|
|
The following is a very popular anti-idiom ::
|
|
|
|
|
|
|
|
def get_status(file):
|
|
|
|
if not os.path.exists(file):
|
|
|
|
print "file not found"
|
|
|
|
sys.exit(1)
|
|
|
|
return open(file).readline()
|
|
|
|
|
2010-09-11 15:57:20 -03:00
|
|
|
Consider the case where the file gets deleted between the time the call to
|
|
|
|
:func:`os.path.exists` is made and the time :func:`open` is called. In that
|
|
|
|
case the last line will raise an :exc:`IOError`. The same thing would happen
|
|
|
|
if *file* exists but has no read permission. Since testing this on a normal
|
|
|
|
machine on existent and non-existent files makes it seem bugless, the test
|
|
|
|
results will seem fine, and the code will get shipped. Later an unhandled
|
|
|
|
:exc:`IOError` (or perhaps some other :exc:`EnvironmentError`) escapes to the
|
|
|
|
user, who gets to watch the ugly traceback.
|
2007-08-15 11:28:01 -03:00
|
|
|
|
2010-09-11 15:57:20 -03:00
|
|
|
Here is a somewhat better way to do it. ::
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
def get_status(file):
|
|
|
|
try:
|
|
|
|
return open(file).readline()
|
2010-09-11 15:57:20 -03:00
|
|
|
except EnvironmentError as err:
|
|
|
|
print "Unable to open file: {}".format(err)
|
2007-08-15 11:28:01 -03:00
|
|
|
sys.exit(1)
|
|
|
|
|
2010-09-11 15:57:20 -03:00
|
|
|
In this version, *either* the file gets opened and the line is read (so it
|
|
|
|
works even on flaky NFS or SMB connections), or an error message is printed
|
|
|
|
that provides all the available information on why the open failed, and the
|
|
|
|
application is aborted.
|
2007-08-15 11:28:01 -03:00
|
|
|
|
2010-09-11 15:57:20 -03:00
|
|
|
However, even this version of :func:`get_status` makes too many assumptions ---
|
|
|
|
that it will only be used in a short running script, and not, say, in a long
|
|
|
|
running server. Sure, the caller could do something like ::
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
try:
|
|
|
|
status = get_status(log)
|
|
|
|
except SystemExit:
|
|
|
|
status = None
|
|
|
|
|
2010-09-11 15:57:20 -03:00
|
|
|
But there is a better way. You should try to use as few ``except`` clauses in
|
|
|
|
your code as you can --- the ones you do use will usually be inside calls which
|
|
|
|
should always succeed, or a catch-all in a main function.
|
2007-08-15 11:28:01 -03:00
|
|
|
|
2010-09-11 15:57:20 -03:00
|
|
|
So, an even better version of :func:`get_status()` is probably ::
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
def get_status(file):
|
|
|
|
return open(file).readline()
|
|
|
|
|
2010-09-11 15:57:20 -03:00
|
|
|
The caller can deal with the exception if it wants (for example, if it tries
|
2007-08-15 11:28:01 -03:00
|
|
|
several files in a loop), or just let the exception filter upwards to *its*
|
|
|
|
caller.
|
|
|
|
|
2010-09-11 15:57:20 -03:00
|
|
|
But the last version still has a serious problem --- due to implementation
|
|
|
|
details in CPython, the file would not be closed when an exception is raised
|
|
|
|
until the exception handler finishes; and, worse, in other implementations
|
|
|
|
(e.g., Jython) it might not be closed at all regardless of whether or not
|
|
|
|
an exception is raised.
|
|
|
|
|
|
|
|
The best version of this function uses the ``open()`` call as a context
|
|
|
|
manager, which will ensure that the file gets closed as soon as the
|
|
|
|
function returns::
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
def get_status(file):
|
2010-04-25 07:17:27 -03:00
|
|
|
with open(file) as fp:
|
2007-08-15 11:28:01 -03:00
|
|
|
return fp.readline()
|
|
|
|
|
|
|
|
|
|
|
|
Using the Batteries
|
|
|
|
===================
|
|
|
|
|
|
|
|
Every so often, people seem to be writing stuff in the Python library again,
|
|
|
|
usually poorly. While the occasional module has a poor interface, it is usually
|
|
|
|
much better to use the rich standard library and data types that come with
|
2009-05-22 07:40:00 -03:00
|
|
|
Python than inventing your own.
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
A useful module very few people know about is :mod:`os.path`. It always has the
|
|
|
|
correct path arithmetic for your operating system, and will usually be much
|
2009-05-22 07:40:00 -03:00
|
|
|
better than whatever you come up with yourself.
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
Compare::
|
|
|
|
|
|
|
|
# ugh!
|
|
|
|
return dir+"/"+file
|
|
|
|
# better
|
|
|
|
return os.path.join(dir, file)
|
|
|
|
|
|
|
|
More useful functions in :mod:`os.path`: :func:`basename`, :func:`dirname` and
|
|
|
|
:func:`splitext`.
|
|
|
|
|
2010-10-31 19:00:27 -03:00
|
|
|
There are also many useful built-in functions people seem not to be aware of
|
|
|
|
for some reason: :func:`min` and :func:`max` can find the minimum/maximum of
|
|
|
|
any sequence with comparable semantics, for example, yet many people write
|
|
|
|
their own :func:`max`/:func:`min`. Another highly useful function is
|
|
|
|
:func:`reduce` which can be used to repeatly apply a binary operation to a
|
|
|
|
sequence, reducing it to a single value. For example, compute a factorial
|
|
|
|
with a series of multiply operations::
|
|
|
|
|
|
|
|
>>> n = 4
|
|
|
|
>>> import operator
|
|
|
|
>>> reduce(operator.mul, range(1, n+1))
|
|
|
|
24
|
|
|
|
|
|
|
|
When it comes to parsing numbers, note that :func:`float`, :func:`int` and
|
|
|
|
:func:`long` all accept string arguments and will reject ill-formed strings
|
|
|
|
by raising an :exc:`ValueError`.
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
|
|
|
|
Using Backslash to Continue Statements
|
|
|
|
======================================
|
|
|
|
|
|
|
|
Since Python treats a newline as a statement terminator, and since statements
|
2009-05-22 07:40:00 -03:00
|
|
|
are often more than is comfortable to put in one line, many people do::
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
if foo.bar()['first'][0] == baz.quux(1, 2)[5:9] and \
|
|
|
|
calculate_number(10, 20) != forbulate(500, 360):
|
|
|
|
pass
|
|
|
|
|
2007-12-29 06:57:00 -04:00
|
|
|
You should realize that this is dangerous: a stray space after the ``\`` would
|
2007-08-15 11:28:01 -03:00
|
|
|
make this line wrong, and stray spaces are notoriously hard to see in editors.
|
|
|
|
In this case, at least it would be a syntax error, but if the code was::
|
|
|
|
|
|
|
|
value = foo.bar()['first'][0]*baz.quux(1, 2)[5:9] \
|
|
|
|
+ calculate_number(10, 20)*forbulate(500, 360)
|
|
|
|
|
|
|
|
then it would just be subtly wrong.
|
|
|
|
|
|
|
|
It is usually much better to use the implicit continuation inside parenthesis:
|
|
|
|
|
|
|
|
This version is bulletproof::
|
|
|
|
|
2009-01-03 16:55:06 -04:00
|
|
|
value = (foo.bar()['first'][0]*baz.quux(1, 2)[5:9]
|
2007-08-15 11:28:01 -03:00
|
|
|
+ calculate_number(10, 20)*forbulate(500, 360))
|
|
|
|
|