344 lines
10 KiB
TeX
344 lines
10 KiB
TeX
\documentclass{howto}
|
|
|
|
\title{Idioms and Anti-Idioms in Python}
|
|
|
|
\release{0.00}
|
|
|
|
\author{Moshe Zadka}
|
|
\authoraddress{howto@zadka.site.co.il}
|
|
|
|
\begin{document}
|
|
\maketitle
|
|
|
|
This document is placed in the public doman.
|
|
|
|
\begin{abstract}
|
|
\noindent
|
|
This document can be considered a companion to the tutorial. It
|
|
shows how to use Python, and even more importantly, how {\em not}
|
|
to use Python.
|
|
\end{abstract}
|
|
|
|
\tableofcontents
|
|
|
|
\section{Language Constructs You Should Not Use}
|
|
|
|
While Python has relatively few gotchas compared to other languages, it
|
|
still has some constructs which are only useful in corner cases, or are
|
|
plain dangerous.
|
|
|
|
\subsection{from module import *}
|
|
|
|
\subsubsection{Inside Function Definitions}
|
|
|
|
\code{from module import *} is {\em invalid} inside function definitions.
|
|
While many versions of Python do no check for the invalidity, it does not
|
|
make it more valid, no more then having a smart lawyer makes a man innocent.
|
|
Do not use it like that ever. Even in versions where it was accepted, it made
|
|
the function execution slower, because the compiler could not be certain
|
|
which names are local and which are global. In Python 2.1 this construct
|
|
causes warnings, and sometimes even errors.
|
|
|
|
\subsubsection{At Module Level}
|
|
|
|
While it is valid to use \code{from module import *} at module level it
|
|
is usually a bad idea. For one, this loses an important property Python
|
|
otherwise has --- you can know where each toplevel name is defined by
|
|
a simple "search" function in your favourite editor. You also open yourself
|
|
to trouble in the future, if some module grows additional functions or
|
|
classes.
|
|
|
|
One of the most awful question asked on the newsgroup is why this code:
|
|
|
|
\begin{verbatim}
|
|
f = open("www")
|
|
f.read()
|
|
\end{verbatim}
|
|
|
|
does not work. Of course, it works just fine (assuming you have a file
|
|
called "www".) But it does not work if somewhere in the module, the
|
|
statement \code{from os import *} is present. The \module{os} module
|
|
has a function called \function{open()} which returns an integer. While
|
|
it is very useful, shadowing builtins is one of its least useful properties.
|
|
|
|
Remember, you can never know for sure what names a module exports, so either
|
|
take what you need --- \code{from module import name1, name2}, or keep them in
|
|
the module and access on a per-need basis ---
|
|
\code{import module;print module.name}.
|
|
|
|
\subsubsection{When It Is Just Fine}
|
|
|
|
There are situations in which \code{from module import *} is just fine:
|
|
|
|
\begin{itemize}
|
|
|
|
\item The interactive prompt. For example, \code{from math import *} makes
|
|
Python an amazing scientific calculator.
|
|
|
|
\item When extending a module in C with a module in Python.
|
|
|
|
\item When the module advertises itself as \code{from import *} safe.
|
|
|
|
\end{itemize}
|
|
|
|
\subsection{Unadorned \keyword{exec}, \function{execfile} and friends}
|
|
|
|
The word ``unadorned'' refers to the use without an explicit dictionary,
|
|
in which case those constructs evaluate code in the {\em current} environment.
|
|
This is dangerous for the same reasons \code{from import *} is dangerous ---
|
|
it might step over variables you are counting on and mess up things for
|
|
the rest of your code. Simply do not do that.
|
|
|
|
Bad examples:
|
|
|
|
\begin{verbatim}
|
|
>>> for name in sys.argv[1:]:
|
|
>>> exec "%s=1" % name
|
|
>>> def func(s, **kw):
|
|
>>> for var, val in kw.items():
|
|
>>> exec "s.%s=val" % var # invalid!
|
|
>>> execfile("handler.py")
|
|
>>> handle()
|
|
\end{verbatim}
|
|
|
|
Good examples:
|
|
|
|
\begin{verbatim}
|
|
>>> d = {}
|
|
>>> for name in sys.argv[1:]:
|
|
>>> d[name] = 1
|
|
>>> def func(s, **kw):
|
|
>>> for var, val in kw.items():
|
|
>>> setattr(s, var, val)
|
|
>>> d={}
|
|
>>> execfile("handle.py", d, d)
|
|
>>> handle = d['handle']
|
|
>>> handle()
|
|
\end{verbatim}
|
|
|
|
\subsection{from module import name1, name2}
|
|
|
|
This is a ``don't'' which is much weaker then the previous ``don't''s
|
|
but is still something you should not do if you don't have good reasons
|
|
to do that. The reason it is usually bad idea is because you suddenly
|
|
have an object which lives in two seperate namespaces. When the binding
|
|
in one namespace changes, the binding in the other will not, so there
|
|
will be a discrepancy between them. This happens when, for example,
|
|
one module is reloaded, or changes the definition of a function at runtime.
|
|
|
|
Bad example:
|
|
|
|
\begin{verbatim}
|
|
# foo.py
|
|
a = 1
|
|
|
|
# bar.py
|
|
from foo import a
|
|
if something():
|
|
a = 2 # danger: foo.a != a
|
|
\end{verbatim}
|
|
|
|
Good example:
|
|
|
|
\begin{verbatim}
|
|
# foo.py
|
|
a = 1
|
|
|
|
# bar.py
|
|
import foo
|
|
if something():
|
|
foo.a = 2
|
|
\end{verbatim}
|
|
|
|
\subsection{except:}
|
|
|
|
Python has the \code{except:} clause, which catches all exceptions.
|
|
Since {\em every} error in Python raises an exception, this makes many
|
|
programming errors look like runtime problems, and hinders
|
|
the debugging process.
|
|
|
|
The following code shows a great example:
|
|
|
|
\begin{verbatim}
|
|
try:
|
|
foo = opne("file") # misspelled "open"
|
|
except:
|
|
sys.exit("could not open file!")
|
|
\end{verbatim}
|
|
|
|
The second line triggers a \exception{NameError} which is caught by the
|
|
except clause. The program will exit, and you will have no idea that
|
|
this has nothing to do with the readability of \code{"file"}.
|
|
|
|
The example above is better written
|
|
|
|
\begin{verbatim}
|
|
try:
|
|
foo = opne("file") # will be changed to "open" as soon as we run it
|
|
except IOError:
|
|
sys.exit("could not open file")
|
|
\end{verbatim}
|
|
|
|
There are some situations in which the \code{except:} clause is useful:
|
|
for example, in a framework when running callbacks, it is good not to
|
|
let any callback disturb the framework.
|
|
|
|
\section{Exceptions}
|
|
|
|
Exceptions are a useful feature of Python. You should learn to raise
|
|
them whenever something unexpected occurs, and catch them only where
|
|
you can do something about them.
|
|
|
|
The following is a very popular anti-idiom
|
|
|
|
\begin{verbatim}
|
|
def get_status(file):
|
|
if not os.path.exists(file):
|
|
print "file not found"
|
|
sys.exit(1)
|
|
return open(file).readline()
|
|
\end{verbatim}
|
|
|
|
Consider the case the file gets deleted between the time the call to
|
|
\function{os.path.exists} is made and the time \function{open} is called.
|
|
That means the last line will throw an \exception{IOError}. The same would
|
|
happen if \var{file} exists but has no read permission. Since testing this
|
|
on a normal machine on existing and non-existing files make it seem bugless,
|
|
that means in testing the results will seem fine, and the code will get
|
|
shipped. Then an unhandled \exception{IOError} escapes to the user, who
|
|
has to watch the ugly traceback.
|
|
|
|
Here is a better way to do it.
|
|
|
|
\begin{verbatim}
|
|
def get_status(file):
|
|
try:
|
|
return open(file).readline()
|
|
except (IOError, OSError):
|
|
print "file not found"
|
|
sys.exit(1)
|
|
\end{verbatim}
|
|
|
|
In this version, *either* the file gets opened and the line is read
|
|
(so it works even on flaky NFS or SMB connections), or the message
|
|
is printed and the application aborted.
|
|
|
|
Still, \function{get_status} makes too many assumptions --- that it
|
|
will only be used in a short running script, and not, say, in a long
|
|
running server. Sure, the caller could do something like
|
|
|
|
\begin{verbatim}
|
|
try:
|
|
status = get_status(log)
|
|
except SystemExit:
|
|
status = None
|
|
\end{verbatim}
|
|
|
|
So, try to make as few \code{except} clauses in your code --- those will
|
|
usually be a catch-all in the \function{main}, or inside calls which
|
|
should always succeed.
|
|
|
|
So, the best version is probably
|
|
|
|
\begin{verbatim}
|
|
def get_status(file):
|
|
return open(file).readline()
|
|
\end{verbatim}
|
|
|
|
The caller can deal with the exception if it wants (for example, if it
|
|
tries several files in a loop), or just let the exception filter upwards
|
|
to {\em its} caller.
|
|
|
|
The last version is not very good either --- due to implementation details,
|
|
the file would not be closed when an exception is raised until the handler
|
|
finishes, and perhaps not at all in non-C implementations (e.g., Jython).
|
|
|
|
\begin{verbatim}
|
|
def get_status(file):
|
|
fp = open(file)
|
|
try:
|
|
return fp.readline()
|
|
finally:
|
|
fp.close()
|
|
\end{verbatim}
|
|
|
|
\section{Using the Batteries}
|
|
|
|
Every so often, people seem to be writing stuff in the Python library
|
|
again, usually poorly. While the occasional module has a poor interface,
|
|
it is usually much better to use the rich standard library and data
|
|
types that come with Python then inventing your own.
|
|
|
|
A useful module very few people know about is \module{os.path}. It
|
|
always has the correct path arithmetic for your operating system, and
|
|
will usually be much better then whatever you come up with yourself.
|
|
|
|
Compare:
|
|
|
|
\begin{verbatim}
|
|
# ugh!
|
|
return dir+"/"+file
|
|
# better
|
|
return os.path.join(dir, file)
|
|
\end{verbatim}
|
|
|
|
More useful functions in \module{os.path}: \function{basename},
|
|
\function{dirname} and \function{splitext}.
|
|
|
|
There are also many useful builtin functions people seem not to be
|
|
aware of for some reason: \function{min()} and \function{max()} can
|
|
find the minimum/maximum of any sequence with comparable semantics,
|
|
for example, yet many people write they own max/min. Another highly
|
|
useful function is \function{reduce()}. Classical use of \function{reduce()}
|
|
is something like
|
|
|
|
\begin{verbatim}
|
|
import sys, operator
|
|
nums = map(float, sys.argv[1:])
|
|
print reduce(operator.add, nums)/len(nums)
|
|
\end{verbatim}
|
|
|
|
This cute little script prints the average of all numbers given on the
|
|
command line. The \function{reduce()} adds up all the numbers, and
|
|
the rest is just some pre- and postprocessing.
|
|
|
|
On the same note, note that \function{float()}, \function{int()} and
|
|
\function{long()} all accept arguments of type string, and so are
|
|
suited to parsing --- assuming you are ready to deal with the
|
|
\exception{ValueError} they raise.
|
|
|
|
\section{Using Backslash to Continue Statements}
|
|
|
|
Since Python treats a newline as a statement terminator,
|
|
and since statements are often more then is comfortable to put
|
|
in one line, many people do:
|
|
|
|
\begin{verbatim}
|
|
if foo.bar()['first'][0] == baz.quux(1, 2)[5:9] and \
|
|
calculate_number(10, 20) != forbulate(500, 360):
|
|
pass
|
|
\end{verbatim}
|
|
|
|
You should realize that this is dangerous: a stray space after the
|
|
\code{\\} would make this line wrong, and stray spaces are notoriously
|
|
hard to see in editors. In this case, at least it would be a syntax
|
|
error, but if the code was:
|
|
|
|
\begin{verbatim}
|
|
value = foo.bar()['first'][0]*baz.quux(1, 2)[5:9] \
|
|
+ calculate_number(10, 20)*forbulate(500, 360)
|
|
\end{verbatim}
|
|
|
|
then it would just be subtly wrong.
|
|
|
|
It is usually much better to use the implicit continuation inside parenthesis:
|
|
|
|
This version is bulletproof:
|
|
|
|
\begin{verbatim}
|
|
value = (foo.bar()['first'][0]*baz.quux(1, 2)[5:9]
|
|
+ calculate_number(10, 20)*forbulate(500, 360))
|
|
\end{verbatim}
|
|
|
|
\end{document}
|