mirror of https://github.com/python/cpython
869 lines
36 KiB
TeX
869 lines
36 KiB
TeX
\documentclass{howto}
|
|
|
|
\usepackage{distutils}
|
|
|
|
% $Id$
|
|
|
|
\title{What's New in Python 2.1}
|
|
\release{1.00}
|
|
\author{A.M. Kuchling}
|
|
\authoraddress{\email{amk@amk.ca}}
|
|
\begin{document}
|
|
\maketitle\tableofcontents
|
|
|
|
\section{Introduction}
|
|
|
|
It's that time again... time for a new Python release, Python 2.1.
|
|
One recent goal of the Python development team has been to accelerate
|
|
the pace of new releases, with a new release coming every 6 to 9
|
|
months. 2.1 is the first release to come out at this faster pace, with
|
|
the first alpha appearing in January, 3 months after the final version
|
|
of 2.0 was released.
|
|
|
|
This article explains the new features in 2.1. While there aren't as
|
|
many changes in 2.1 as there were in Python 2.0, there are still some
|
|
pleasant surprises in store. 2.1 is the first release to be steered
|
|
through the use of Python Enhancement Proposals, or PEPs, so most of
|
|
the sizable changes have accompanying PEPs that provide more complete
|
|
documentation and a design rationale for the change. This article
|
|
doesn't attempt to document the new features completely, but simply
|
|
provides an overview of the new features for Python programmers.
|
|
Refer to the Python 2.1 documentation, or to the specific PEP, for
|
|
more details about any new feature that particularly interests you.
|
|
|
|
The final release of Python 2.1 was made on April 17, 2001.
|
|
|
|
%======================================================================
|
|
\section{PEP 227: Nested Scopes}
|
|
|
|
The largest change in Python 2.1 is to Python's scoping rules. In
|
|
Python 2.0, at any given time there are at most three namespaces used
|
|
to look up variable names: local, module-level, and the built-in
|
|
namespace. This often surprised people because it didn't match their
|
|
intuitive expectations. For example, a nested recursive function
|
|
definition doesn't work:
|
|
|
|
\begin{verbatim}
|
|
def f():
|
|
...
|
|
def g(value):
|
|
...
|
|
return g(value-1) + 1
|
|
...
|
|
\end{verbatim}
|
|
|
|
The function \function{g()} will always raise a \exception{NameError}
|
|
exception, because the binding of the name \samp{g} isn't in either
|
|
its local namespace or in the module-level namespace. This isn't much
|
|
of a problem in practice (how often do you recursively define interior
|
|
functions like this?), but this also made using the \keyword{lambda}
|
|
statement clumsier, and this was a problem in practice. In code which
|
|
uses \keyword{lambda} you can often find local variables being copied
|
|
by passing them as the default values of arguments.
|
|
|
|
\begin{verbatim}
|
|
def find(self, name):
|
|
"Return list of any entries equal to 'name'"
|
|
L = filter(lambda x, name=name: x == name,
|
|
self.list_attribute)
|
|
return L
|
|
\end{verbatim}
|
|
|
|
The readability of Python code written in a strongly functional style
|
|
suffers greatly as a result.
|
|
|
|
The most significant change to Python 2.1 is that static scoping has
|
|
been added to the language to fix this problem. As a first effect,
|
|
the \code{name=name} default argument is now unnecessary in the above
|
|
example. Put simply, when a given variable name is not assigned a
|
|
value within a function (by an assignment, or the \keyword{def},
|
|
\keyword{class}, or \keyword{import} statements), references to the
|
|
variable will be looked up in the local namespace of the enclosing
|
|
scope. A more detailed explanation of the rules, and a dissection of
|
|
the implementation, can be found in the PEP.
|
|
|
|
This change may cause some compatibility problems for code where the
|
|
same variable name is used both at the module level and as a local
|
|
variable within a function that contains further function definitions.
|
|
This seems rather unlikely though, since such code would have been
|
|
pretty confusing to read in the first place.
|
|
|
|
One side effect of the change is that the \code{from \var{module}
|
|
import *} and \keyword{exec} statements have been made illegal inside
|
|
a function scope under certain conditions. The Python reference
|
|
manual has said all along that \code{from \var{module} import *} is
|
|
only legal at the top level of a module, but the CPython interpreter
|
|
has never enforced this before. As part of the implementation of
|
|
nested scopes, the compiler which turns Python source into bytecodes
|
|
has to generate different code to access variables in a containing
|
|
scope. \code{from \var{module} import *} and \keyword{exec} make it
|
|
impossible for the compiler to figure this out, because they add names
|
|
to the local namespace that are unknowable at compile time.
|
|
Therefore, if a function contains function definitions or
|
|
\keyword{lambda} expressions with free variables, the compiler will
|
|
flag this by raising a \exception{SyntaxError} exception.
|
|
|
|
To make the preceding explanation a bit clearer, here's an example:
|
|
|
|
\begin{verbatim}
|
|
x = 1
|
|
def f():
|
|
# The next line is a syntax error
|
|
exec 'x=2'
|
|
def g():
|
|
return x
|
|
\end{verbatim}
|
|
|
|
Line 4 containing the \keyword{exec} statement is a syntax error,
|
|
since \keyword{exec} would define a new local variable named \samp{x}
|
|
whose value should be accessed by \function{g()}.
|
|
|
|
This shouldn't be much of a limitation, since \keyword{exec} is rarely
|
|
used in most Python code (and when it is used, it's often a sign of a
|
|
poor design anyway).
|
|
|
|
Compatibility concerns have led to nested scopes being introduced
|
|
gradually; in Python 2.1, they aren't enabled by default, but can be
|
|
turned on within a module by using a future statement as described in
|
|
PEP 236. (See the following section for further discussion of PEP
|
|
236.) In Python 2.2, nested scopes will become the default and there
|
|
will be no way to turn them off, but users will have had all of 2.1's
|
|
lifetime to fix any breakage resulting from their introduction.
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{227}{Statically Nested Scopes}{Written and implemented by
|
|
Jeremy Hylton.}
|
|
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{PEP 236: __future__ Directives}
|
|
|
|
The reaction to nested scopes was widespread concern about the dangers
|
|
of breaking code with the 2.1 release, and it was strong enough to
|
|
make the Pythoneers take a more conservative approach. This approach
|
|
consists of introducing a convention for enabling optional
|
|
functionality in release N that will become compulsory in release N+1.
|
|
|
|
The syntax uses a \code{from...import} statement using the reserved
|
|
module name \module{__future__}. Nested scopes can be enabled by the
|
|
following statement:
|
|
|
|
\begin{verbatim}
|
|
from __future__ import nested_scopes
|
|
\end{verbatim}
|
|
|
|
While it looks like a normal \keyword{import} statement, it's not;
|
|
there are strict rules on where such a future statement can be put.
|
|
They can only be at the top of a module, and must precede any Python
|
|
code or regular \keyword{import} statements. This is because such
|
|
statements can affect how the Python bytecode compiler parses code and
|
|
generates bytecode, so they must precede any statement that will
|
|
result in bytecodes being produced.
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{236}{Back to the \module{__future__}}{Written by Tim Peters,
|
|
and primarily implemented by Jeremy Hylton.}
|
|
|
|
\end{seealso}
|
|
|
|
%======================================================================
|
|
\section{PEP 207: Rich Comparisons}
|
|
|
|
In earlier versions, Python's support for implementing comparisons on
|
|
user-defined classes and extension types was quite simple. Classes
|
|
could implement a \method{__cmp__} method that was given two instances
|
|
of a class, and could only return 0 if they were equal or +1 or -1 if
|
|
they weren't; the method couldn't raise an exception or return
|
|
anything other than a Boolean value. Users of Numeric Python often
|
|
found this model too weak and restrictive, because in the
|
|
number-crunching programs that numeric Python is used for, it would be
|
|
more useful to be able to perform elementwise comparisons of two
|
|
matrices, returning a matrix containing the results of a given
|
|
comparison for each element. If the two matrices are of different
|
|
sizes, then the compare has to be able to raise an exception to signal
|
|
the error.
|
|
|
|
In Python 2.1, rich comparisons were added in order to support this
|
|
need. Python classes can now individually overload each of the
|
|
\code{<}, \code{<=}, \code{>}, \code{>=}, \code{==}, and \code{!=}
|
|
operations. The new magic method names are:
|
|
|
|
\begin{tableii}{c|l}{code}{Operation}{Method name}
|
|
\lineii{<}{\method{__lt__}} \lineii{<=}{\method{__le__}}
|
|
\lineii{>}{\method{__gt__}} \lineii{>=}{\method{__ge__}}
|
|
\lineii{==}{\method{__eq__}} \lineii{!=}{\method{__ne__}}
|
|
\end{tableii}
|
|
|
|
(The magic methods are named after the corresponding Fortran operators
|
|
\code{.LT.}. \code{.LE.}, \&c. Numeric programmers are almost
|
|
certainly quite familar with these names and will find them easy to
|
|
remember.)
|
|
|
|
Each of these magic methods is of the form \code{\var{method}(self,
|
|
other)}, where \code{self} will be the object on the left-hand side of
|
|
the operator, while \code{other} will be the object on the right-hand
|
|
side. For example, the expression \code{A < B} will cause
|
|
\code{A.__lt__(B)} to be called.
|
|
|
|
Each of these magic methods can return anything at all: a Boolean, a
|
|
matrix, a list, or any other Python object. Alternatively they can
|
|
raise an exception if the comparison is impossible, inconsistent, or
|
|
otherwise meaningless.
|
|
|
|
The built-in \function{cmp(A,B)} function can use the rich comparison
|
|
machinery, and now accepts an optional argument specifying which
|
|
comparison operation to use; this is given as one of the strings
|
|
\code{"<"}, \code{"<="}, \code{">"}, \code{">="}, \code{"=="}, or
|
|
\code{"!="}. If called without the optional third argument,
|
|
\function{cmp()} will only return -1, 0, or +1 as in previous versions
|
|
of Python; otherwise it will call the appropriate method and can
|
|
return any Python object.
|
|
|
|
There are also corresponding changes of interest to C programmers;
|
|
there's a new slot \code{tp_richcmp} in type objects and an API for
|
|
performing a given rich comparison. I won't cover the C API here, but
|
|
will refer you to PEP 207, or to 2.1's C API documentation, for the
|
|
full list of related functions.
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{207}{Rich Comparisions}{Written by Guido van Rossum, heavily
|
|
based on earlier work by David Ascher, and implemented by Guido van
|
|
Rossum.}
|
|
|
|
\end{seealso}
|
|
|
|
%======================================================================
|
|
\section{PEP 230: Warning Framework}
|
|
|
|
Over its 10 years of existence, Python has accumulated a certain
|
|
number of obsolete modules and features along the way. It's difficult
|
|
to know when a feature is safe to remove, since there's no way of
|
|
knowing how much code uses it --- perhaps no programs depend on the
|
|
feature, or perhaps many do. To enable removing old features in a
|
|
more structured way, a warning framework was added. When the Python
|
|
developers want to get rid of a feature, it will first trigger a
|
|
warning in the next version of Python. The following Python version
|
|
can then drop the feature, and users will have had a full release
|
|
cycle to remove uses of the old feature.
|
|
|
|
Python 2.1 adds the warning framework to be used in this scheme. It
|
|
adds a \module{warnings} module that provide functions to issue
|
|
warnings, and to filter out warnings that you don't want to be
|
|
displayed. Third-party modules can also use this framework to
|
|
deprecate old features that they no longer wish to support.
|
|
|
|
For example, in Python 2.1 the \module{regex} module is deprecated, so
|
|
importing it causes a warning to be printed:
|
|
|
|
\begin{verbatim}
|
|
>>> import regex
|
|
__main__:1: DeprecationWarning: the regex module
|
|
is deprecated; please use the re module
|
|
>>>
|
|
\end{verbatim}
|
|
|
|
Warnings can be issued by calling the \function{warnings.warn}
|
|
function:
|
|
|
|
\begin{verbatim}
|
|
warnings.warn("feature X no longer supported")
|
|
\end{verbatim}
|
|
|
|
The first parameter is the warning message; an additional optional
|
|
parameters can be used to specify a particular warning category.
|
|
|
|
Filters can be added to disable certain warnings; a regular expression
|
|
pattern can be applied to the message or to the module name in order
|
|
to suppress a warning. For example, you may have a program that uses
|
|
the \module{regex} module and not want to spare the time to convert it
|
|
to use the \module{re} module right now. The warning can be
|
|
suppressed by calling
|
|
|
|
\begin{verbatim}
|
|
import warnings
|
|
warnings.filterwarnings(action = 'ignore',
|
|
message='.*regex module is deprecated',
|
|
category=DeprecationWarning,
|
|
module = '__main__')
|
|
\end{verbatim}
|
|
|
|
This adds a filter that will apply only to warnings of the class
|
|
\class{DeprecationWarning} triggered in the \module{__main__} module,
|
|
and applies a regular expression to only match the message about the
|
|
\module{regex} module being deprecated, and will cause such warnings
|
|
to be ignored. Warnings can also be printed only once, printed every
|
|
time the offending code is executed, or turned into exceptions that
|
|
will cause the program to stop (unless the exceptions are caught in
|
|
the usual way, of course).
|
|
|
|
Functions were also added to Python's C API for issuing warnings;
|
|
refer to PEP 230 or to Python's API documentation for the details.
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{5}{Guidelines for Language Evolution}{Written
|
|
by Paul Prescod, to specify procedures to be followed when removing
|
|
old features from Python. The policy described in this PEP hasn't
|
|
been officially adopted, but the eventual policy probably won't be too
|
|
different from Prescod's proposal.}
|
|
|
|
\seepep{230}{Warning Framework}{Written and implemented by Guido van
|
|
Rossum.}
|
|
|
|
\end{seealso}
|
|
|
|
%======================================================================
|
|
\section{PEP 229: New Build System}
|
|
|
|
When compiling Python, the user had to go in and edit the
|
|
\file{Modules/Setup} file in order to enable various additional
|
|
modules; the default set is relatively small and limited to modules
|
|
that compile on most Unix platforms. This means that on Unix
|
|
platforms with many more features, most notably Linux, Python
|
|
installations often don't contain all useful modules they could.
|
|
|
|
Python 2.0 added the Distutils, a set of modules for distributing and
|
|
installing extensions. In Python 2.1, the Distutils are used to
|
|
compile much of the standard library of extension modules,
|
|
autodetecting which ones are supported on the current machine. It's
|
|
hoped that this will make Python installations easier and more
|
|
featureful.
|
|
|
|
Instead of having to edit the \file{Modules/Setup} file in order to
|
|
enable modules, a \file{setup.py} script in the top directory of the
|
|
Python source distribution is run at build time, and attempts to
|
|
discover which modules can be enabled by examining the modules and
|
|
header files on the system. If a module is configured in
|
|
\file{Modules/Setup}, the \file{setup.py} script won't attempt to
|
|
compile that module and will defer to the \file{Modules/Setup} file's
|
|
contents. This provides a way to specific any strange command-line
|
|
flags or libraries that are required for a specific platform.
|
|
|
|
In another far-reaching change to the build mechanism, Neil
|
|
Schemenauer restructured things so Python now uses a single makefile
|
|
that isn't recursive, instead of makefiles in the top directory and in
|
|
each of the \file{Python/}, \file{Parser/}, \file{Objects/}, and
|
|
\file{Modules/} subdirectories. This makes building Python faster
|
|
and also makes hacking the Makefiles clearer and simpler.
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{229}{Using Distutils to Build Python}{Written
|
|
and implemented by A.M. Kuchling.}
|
|
|
|
\end{seealso}
|
|
|
|
%======================================================================
|
|
\section{PEP 205: Weak References}
|
|
|
|
Weak references, available through the \module{weakref} module, are a
|
|
minor but useful new data type in the Python programmer's toolbox.
|
|
|
|
Storing a reference to an object (say, in a dictionary or a list) has
|
|
the side effect of keeping that object alive forever. There are a few
|
|
specific cases where this behaviour is undesirable, object caches
|
|
being the most common one, and another being circular references in
|
|
data structures such as trees.
|
|
|
|
For example, consider a memoizing function that caches the results of
|
|
another function \function{f(\var{x})} by storing the function's
|
|
argument and its result in a dictionary:
|
|
|
|
\begin{verbatim}
|
|
_cache = {}
|
|
def memoize(x):
|
|
if _cache.has_key(x):
|
|
return _cache[x]
|
|
|
|
retval = f(x)
|
|
|
|
# Cache the returned object
|
|
_cache[x] = retval
|
|
|
|
return retval
|
|
\end{verbatim}
|
|
|
|
This version works for simple things such as integers, but it has a
|
|
side effect; the \code{_cache} dictionary holds a reference to the
|
|
return values, so they'll never be deallocated until the Python
|
|
process exits and cleans up This isn't very noticeable for integers,
|
|
but if \function{f()} returns an object, or a data structure that
|
|
takes up a lot of memory, this can be a problem.
|
|
|
|
Weak references provide a way to implement a cache that won't keep
|
|
objects alive beyond their time. If an object is only accessible
|
|
through weak references, the object will be deallocated and the weak
|
|
references will now indicate that the object it referred to no longer
|
|
exists. A weak reference to an object \var{obj} is created by calling
|
|
\code{wr = weakref.ref(\var{obj})}. The object being referred to is
|
|
returned by calling the weak reference as if it were a function:
|
|
\code{wr()}. It will return the referenced object, or \code{None} if
|
|
the object no longer exists.
|
|
|
|
This makes it possible to write a \function{memoize()} function whose
|
|
cache doesn't keep objects alive, by storing weak references in the
|
|
cache.
|
|
|
|
\begin{verbatim}
|
|
_cache = {}
|
|
def memoize(x):
|
|
if _cache.has_key(x):
|
|
obj = _cache[x]()
|
|
# If weak reference object still exists,
|
|
# return it
|
|
if obj is not None: return obj
|
|
|
|
retval = f(x)
|
|
|
|
# Cache a weak reference
|
|
_cache[x] = weakref.ref(retval)
|
|
|
|
return retval
|
|
\end{verbatim}
|
|
|
|
The \module{weakref} module also allows creating proxy objects which
|
|
behave like weak references --- an object referenced only by proxy
|
|
objects is deallocated -- but instead of requiring an explicit call to
|
|
retrieve the object, the proxy transparently forwards all operations
|
|
to the object as long as the object still exists. If the object is
|
|
deallocated, attempting to use a proxy will cause a
|
|
\exception{weakref.ReferenceError} exception to be raised.
|
|
|
|
\begin{verbatim}
|
|
proxy = weakref.proxy(obj)
|
|
proxy.attr # Equivalent to obj.attr
|
|
proxy.meth() # Equivalent to obj.meth()
|
|
del obj
|
|
proxy.attr # raises weakref.ReferenceError
|
|
\end{verbatim}
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{205}{Weak References}{Written and implemented by
|
|
Fred~L. Drake,~Jr.}
|
|
|
|
\end{seealso}
|
|
|
|
%======================================================================
|
|
\section{PEP 232: Function Attributes}
|
|
|
|
In Python 2.1, functions can now have arbitrary information attached
|
|
to them. People were often using docstrings to hold information about
|
|
functions and methods, because the \code{__doc__} attribute was the
|
|
only way of attaching any information to a function. For example, in
|
|
the Zope Web application server, functions are marked as safe for
|
|
public access by having a docstring, and in John Aycock's SPARK
|
|
parsing framework, docstrings hold parts of the BNF grammar to be
|
|
parsed. This overloading is unfortunate, since docstrings are really
|
|
intended to hold a function's documentation; for example, it means you
|
|
can't properly document functions intended for private use in Zope.
|
|
|
|
Arbitrary attributes can now be set and retrieved on functions using the
|
|
regular Python syntax:
|
|
|
|
\begin{verbatim}
|
|
def f(): pass
|
|
|
|
f.publish = 1
|
|
f.secure = 1
|
|
f.grammar = "A ::= B (C D)*"
|
|
\end{verbatim}
|
|
|
|
The dictionary containing attributes can be accessed as the function's
|
|
\member{__dict__}. Unlike the \member{__dict__} attribute of class
|
|
instances, in functions you can actually assign a new dictionary to
|
|
\member{__dict__}, though the new value is restricted to a regular
|
|
Python dictionary; you \emph{can't} be tricky and set it to a
|
|
\class{UserDict} instance, or any other random object that behaves
|
|
like a mapping.
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{232}{Function Attributes}{Written and implemented by Barry
|
|
Warsaw.}
|
|
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
|
|
\section{PEP 235: Importing Modules on Case-Insensitive Platforms}
|
|
|
|
Some operating systems have filesystems that are case-insensitive,
|
|
MacOS and Windows being the primary examples; on these systems, it's
|
|
impossible to distinguish the filenames \samp{FILE.PY} and
|
|
\samp{file.py}, even though they do store the file's name
|
|
in its original case (they're case-preserving, too).
|
|
|
|
In Python 2.1, the \keyword{import} statement will work to simulate
|
|
case-sensitivity on case-insensitive platforms. Python will now
|
|
search for the first case-sensitive match by default, raising an
|
|
\exception{ImportError} if no such file is found, so \code{import file}
|
|
will not import a module named \samp{FILE.PY}. Case-insensitive
|
|
matching can be requested by setting the \envvar{PYTHONCASEOK} environment
|
|
variable before starting the Python interpreter.
|
|
|
|
%======================================================================
|
|
\section{PEP 217: Interactive Display Hook}
|
|
|
|
When using the Python interpreter interactively, the output of
|
|
commands is displayed using the built-in \function{repr()} function.
|
|
In Python 2.1, the variable \function{sys.displayhook} can be set to a
|
|
callable object which will be called instead of \function{repr()}.
|
|
For example, you can set it to a special pretty-printing function:
|
|
|
|
\begin{verbatim}
|
|
>>> # Create a recursive data structure
|
|
... L = [1,2,3]
|
|
>>> L.append(L)
|
|
>>> L # Show Python's default output
|
|
[1, 2, 3, [...]]
|
|
>>> # Use pprint.pprint() as the display function
|
|
... import sys, pprint
|
|
>>> sys.displayhook = pprint.pprint
|
|
>>> L
|
|
[1, 2, 3, <Recursion on list with id=135143996>]
|
|
>>>
|
|
\end{verbatim}
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{217}{Display Hook for Interactive Use}{Written and implemented
|
|
by Moshe Zadka.}
|
|
|
|
\end{seealso}
|
|
|
|
%======================================================================
|
|
\section{PEP 208: New Coercion Model}
|
|
|
|
How numeric coercion is done at the C level was significantly
|
|
modified. This will only affect the authors of C extensions to
|
|
Python, allowing them more flexibility in writing extension types that
|
|
support numeric operations.
|
|
|
|
Extension types can now set the type flag \code{Py_TPFLAGS_CHECKTYPES}
|
|
in their \code{PyTypeObject} structure to indicate that they support
|
|
the new coercion model. In such extension types, the numeric slot
|
|
functions can no longer assume that they'll be passed two arguments of
|
|
the same type; instead they may be passed two arguments of differing
|
|
types, and can then perform their own internal coercion. If the slot
|
|
function is passed a type it can't handle, it can indicate the failure
|
|
by returning a reference to the \code{Py_NotImplemented} singleton
|
|
value. The numeric functions of the other type will then be tried,
|
|
and perhaps they can handle the operation; if the other type also
|
|
returns \code{Py_NotImplemented}, then a \exception{TypeError} will be
|
|
raised. Numeric methods written in Python can also return
|
|
\code{Py_NotImplemented}, causing the interpreter to act as if the
|
|
method did not exist (perhaps raising a \exception{TypeError}, perhaps
|
|
trying another object's numeric methods).
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{208}{Reworking the Coercion Model}{Written and implemented by
|
|
Neil Schemenauer, heavily based upon earlier work by Marc-Andr\'e
|
|
Lemburg. Read this to understand the fine points of how numeric
|
|
operations will now be processed at the C level.}
|
|
|
|
\end{seealso}
|
|
|
|
%======================================================================
|
|
\section{PEP 241: Metadata in Python Packages}
|
|
|
|
A common complaint from Python users is that there's no single catalog
|
|
of all the Python modules in existence. T.~Middleton's Vaults of
|
|
Parnassus at \url{http://www.vex.net/parnassus/} are the largest
|
|
catalog of Python modules, but registering software at the Vaults is
|
|
optional, and many people don't bother.
|
|
|
|
As a first small step toward fixing the problem, Python software
|
|
packaged using the Distutils \command{sdist} command will include a
|
|
file named \file{PKG-INFO} containing information about the package
|
|
such as its name, version, and author (metadata, in cataloguing
|
|
terminology). PEP 241 contains the full list of fields that can be
|
|
present in the \file{PKG-INFO} file. As people began to package their
|
|
software using Python 2.1, more and more packages will include
|
|
metadata, making it possible to build automated cataloguing systems
|
|
and experiment with them. With the result experience, perhaps it'll
|
|
be possible to design a really good catalog and then build support for
|
|
it into Python 2.2. For example, the Distutils \command{sdist}
|
|
and \command{bdist_*} commands could support a \option{upload} option
|
|
that would automatically upload your package to a catalog server.
|
|
|
|
You can start creating packages containing \file{PKG-INFO} even if
|
|
you're not using Python 2.1, since a new release of the Distutils will
|
|
be made for users of earlier Python versions. Version 1.0.2 of the
|
|
Distutils includes the changes described in PEP 241, as well as
|
|
various bugfixes and enhancements. It will be available from
|
|
the Distutils SIG at \url{http://www.python.org/sigs/distutils-sig/}.
|
|
|
|
% XXX update when I actually release 1.0.2
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{241}{Metadata for Python Software Packages}{Written and
|
|
implemented by A.M. Kuchling.}
|
|
|
|
\seepep{243}{Module Repository Upload Mechanism}{Written by Sean
|
|
Reifschneider, this draft PEP describes a proposed mechanism for uploading
|
|
Python packages to a central server.
|
|
}
|
|
|
|
\end{seealso}
|
|
|
|
%======================================================================
|
|
\section{New and Improved Modules}
|
|
|
|
\begin{itemize}
|
|
|
|
\item Ka-Ping Yee contributed two new modules: \module{inspect.py}, a
|
|
module for getting information about live Python code, and
|
|
\module{pydoc.py}, a module for interactively converting docstrings to
|
|
HTML or text. As a bonus, \file{Tools/scripts/pydoc}, which is now
|
|
automatically installed, uses \module{pydoc.py} to display
|
|
documentation given a Python module, package, or class name. For
|
|
example, \samp{pydoc xml.dom} displays the following:
|
|
|
|
\begin{verbatim}
|
|
Python Library Documentation: package xml.dom in xml
|
|
|
|
NAME
|
|
xml.dom - W3C Document Object Model implementation for Python.
|
|
|
|
FILE
|
|
/usr/local/lib/python2.1/xml/dom/__init__.pyc
|
|
|
|
DESCRIPTION
|
|
The Python mapping of the Document Object Model is documented in the
|
|
Python Library Reference in the section on the xml.dom package.
|
|
|
|
This package contains the following modules:
|
|
...
|
|
\end{verbatim}
|
|
|
|
\file{pydoc} also includes a Tk-based interactive help browser.
|
|
\file{pydoc} quickly becomes addictive; try it out!
|
|
|
|
\item Two different modules for unit testing were added to the
|
|
standard library. The \module{doctest} module, contributed by Tim
|
|
Peters, provides a testing framework based on running embedded
|
|
examples in docstrings and comparing the results against the expected
|
|
output. PyUnit, contributed by Steve Purcell, is a unit testing
|
|
framework inspired by JUnit, which was in turn an adaptation of Kent
|
|
Beck's Smalltalk testing framework. See
|
|
\url{http://pyunit.sourceforge.net/} for more information about
|
|
PyUnit.
|
|
|
|
\item The \module{difflib} module contains a class,
|
|
\class{SequenceMatcher}, which compares two sequences and computes the
|
|
changes required to transform one sequence into the other. For
|
|
example, this module can be used to write a tool similar to the Unix
|
|
\program{diff} program, and in fact the sample program
|
|
\file{Tools/scripts/ndiff.py} demonstrates how to write such a script.
|
|
|
|
\item \module{curses.panel}, a wrapper for the panel library, part of
|
|
ncurses and of SYSV curses, was contributed by Thomas Gellekum. The
|
|
panel library provides windows with the additional feature of depth.
|
|
Windows can be moved higher or lower in the depth ordering, and the
|
|
panel library figures out where panels overlap and which sections are
|
|
visible.
|
|
|
|
\item The PyXML package has gone through a few releases since Python
|
|
2.0, and Python 2.1 includes an updated version of the \module{xml}
|
|
package. Some of the noteworthy changes include support for Expat 1.2
|
|
and later versions, the ability for Expat parsers to handle files in
|
|
any encoding supported by Python, and various bugfixes for SAX, DOM,
|
|
and the \module{minidom} module.
|
|
|
|
\item Ping also contributed another hook for handling uncaught
|
|
exceptions. \function{sys.excepthook} can be set to a callable
|
|
object. When an exception isn't caught by any
|
|
\keyword{try}...\keyword{except} blocks, the exception will be passed
|
|
to \function{sys.excepthook}, which can then do whatever it likes. At
|
|
the Ninth Python Conference, Ping demonstrated an application for this
|
|
hook: printing an extended traceback that not only lists the stack
|
|
frames, but also lists the function arguments and the local variables
|
|
for each frame.
|
|
|
|
\item Various functions in the \module{time} module, such as
|
|
\function{asctime()} and \function{localtime()}, require a floating
|
|
point argument containing the time in seconds since the epoch. The
|
|
most common use of these functions is to work with the current time,
|
|
so the floating point argument has been made optional; when a value
|
|
isn't provided, the current time will be used. For example, log file
|
|
entries usually need a string containing the current time; in Python
|
|
2.1, \code{time.asctime()} can be used, instead of the lengthier
|
|
\code{time.asctime(time.localtime(time.time()))} that was previously
|
|
required.
|
|
|
|
This change was proposed and implemented by Thomas Wouters.
|
|
|
|
\item The \module{ftplib} module now defaults to retrieving files in
|
|
passive mode, because passive mode is more likely to work from behind
|
|
a firewall. This request came from the Debian bug tracking system,
|
|
since other Debian packages use \module{ftplib} to retrieve files and
|
|
then don't work from behind a firewall. It's deemed unlikely that
|
|
this will cause problems for anyone, because Netscape defaults to
|
|
passive mode and few people complain, but if passive mode is
|
|
unsuitable for your application or network setup, call
|
|
\method{set_pasv(0)} on FTP objects to disable passive mode.
|
|
|
|
\item Support for raw socket access has been added to the
|
|
\module{socket} module, contributed by Grant Edwards.
|
|
|
|
\item The \module{pstats} module now contains a simple interactive
|
|
statistics browser for displaying timing profiles for Python programs,
|
|
invoked when the module is run as a script. Contributed by
|
|
Eric S.\ Raymond.
|
|
|
|
\item A new implementation-dependent function, \function{sys._getframe(\optional{depth})},
|
|
has been added to return a given frame object from the current call stack.
|
|
\function{sys._getframe()} returns the frame at the top of the call stack;
|
|
if the optional integer argument \var{depth} is supplied, the function returns the frame
|
|
that is \var{depth} calls below the top of the stack. For example, \code{sys._getframe(1)}
|
|
returns the caller's frame object.
|
|
|
|
This function is only present in CPython, not in Jython or the .NET
|
|
implementation. Use it for debugging, and resist the temptation to
|
|
put it into production code.
|
|
|
|
|
|
|
|
\end{itemize}
|
|
|
|
%======================================================================
|
|
\section{Other Changes and Fixes}
|
|
|
|
There were relatively few smaller changes made in Python 2.1 due to
|
|
the shorter release cycle. A search through the CVS change logs turns
|
|
up 117 patches applied, and 136 bugs fixed; both figures are likely to
|
|
be underestimates. Some of the more notable changes are:
|
|
|
|
\begin{itemize}
|
|
|
|
|
|
\item A specialized object allocator is now optionally available, that
|
|
should be faster than the system \function{malloc()} and have less
|
|
memory overhead. The allocator uses C's \function{malloc()} function
|
|
to get large pools of memory, and then fulfills smaller memory
|
|
requests from these pools. It can be enabled by providing the
|
|
\longprogramopt{with-pymalloc} option to the \program{configure} script; see
|
|
\file{Objects/obmalloc.c} for the implementation details.
|
|
|
|
Authors of C extension modules should test their code with the object
|
|
allocator enabled, because some incorrect code may break, causing core
|
|
dumps at runtime. There are a bunch of memory allocation functions in
|
|
Python's C API that have previously been just aliases for the C
|
|
library's \function{malloc()} and \function{free()}, meaning that if
|
|
you accidentally called mismatched functions, the error wouldn't be
|
|
noticeable. When the object allocator is enabled, these functions
|
|
aren't aliases of \function{malloc()} and \function{free()} any more,
|
|
and calling the wrong function to free memory will get you a core
|
|
dump. For example, if memory was allocated using
|
|
\function{PyMem_New()}, it has to be freed using
|
|
\function{PyMem_Del()}, not \function{free()}. A few modules included
|
|
with Python fell afoul of this and had to be fixed; doubtless there
|
|
are more third-party modules that will have the same problem.
|
|
|
|
The object allocator was contributed by Vladimir Marangozov.
|
|
|
|
\item The speed of line-oriented file I/O has been improved because
|
|
people often complain about its lack of speed, and because it's often
|
|
been used as a na\"ive benchmark. The \method{readline()} method of
|
|
file objects has therefore been rewritten to be much faster. The
|
|
exact amount of the speedup will vary from platform to platform
|
|
depending on how slow the C library's \function{getc()} was, but is
|
|
around 66\%, and potentially much faster on some particular operating
|
|
systems. Tim Peters did much of the benchmarking and coding for this
|
|
change, motivated by a discussion in comp.lang.python.
|
|
|
|
A new module and method for file objects was also added, contributed
|
|
by Jeff Epler. The new method, \method{xreadlines()}, is similar to
|
|
the existing \function{xrange()} built-in. \function{xreadlines()}
|
|
returns an opaque sequence object that only supports being iterated
|
|
over, reading a line on every iteration but not reading the entire
|
|
file into memory as the existing \method{readlines()} method does.
|
|
You'd use it like this:
|
|
|
|
\begin{verbatim}
|
|
for line in sys.stdin.xreadlines():
|
|
# ... do something for each line ...
|
|
...
|
|
\end{verbatim}
|
|
|
|
For a fuller discussion of the line I/O changes, see the python-dev
|
|
summary for January 1-15, 2001 at
|
|
\url{http://www.python.org/dev/summary/2001-01-1.html}.
|
|
|
|
\item A new method, \method{popitem()}, was added to dictionaries to
|
|
enable destructively iterating through the contents of a dictionary;
|
|
this can be faster for large dictionaries because there's no need to
|
|
construct a list containing all the keys or values.
|
|
\code{D.popitem()} removes a random \code{(\var{key}, \var{value})}
|
|
pair from the dictionary~\code{D} and returns it as a 2-tuple. This
|
|
was implemented mostly by Tim Peters and Guido van Rossum, after a
|
|
suggestion and preliminary patch by Moshe Zadka.
|
|
|
|
\item Modules can now control which names are imported when \code{from
|
|
\var{module} import *} is used, by defining an \code{__all__}
|
|
attribute containing a list of names that will be imported. One
|
|
common complaint is that if the module imports other modules such as
|
|
\module{sys} or \module{string}, \code{from \var{module} import *}
|
|
will add them to the importing module's namespace. To fix this,
|
|
simply list the public names in \code{__all__}:
|
|
|
|
\begin{verbatim}
|
|
# List public names
|
|
__all__ = ['Database', 'open']
|
|
\end{verbatim}
|
|
|
|
A stricter version of this patch was first suggested and implemented
|
|
by Ben Wolfson, but after some python-dev discussion, a weaker final
|
|
version was checked in.
|
|
|
|
\item Applying \function{repr()} to strings previously used octal
|
|
escapes for non-printable characters; for example, a newline was
|
|
\code{'\e 012'}. This was a vestigial trace of Python's C ancestry, but
|
|
today octal is of very little practical use. Ka-Ping Yee suggested
|
|
using hex escapes instead of octal ones, and using the \code{\e n},
|
|
\code{\e t}, \code{\e r} escapes for the appropriate characters, and
|
|
implemented this new formatting.
|
|
|
|
\item Syntax errors detected at compile-time can now raise exceptions
|
|
containing the filename and line number of the error, a pleasant side
|
|
effect of the compiler reorganization done by Jeremy Hylton.
|
|
|
|
\item C extensions which import other modules have been changed to use
|
|
\function{PyImport_ImportModule()}, which means that they will use any
|
|
import hooks that have been installed. This is also encouraged for
|
|
third-party extensions that need to import some other module from C
|
|
code.
|
|
|
|
\item The size of the Unicode character database was shrunk by another
|
|
340K thanks to Fredrik Lundh.
|
|
|
|
\item Some new ports were contributed: MacOS X (by Steven Majewski),
|
|
Cygwin (by Jason Tishler); RISCOS (by Dietmar Schwertberger); Unixware~7
|
|
(by Billy G. Allie).
|
|
|
|
\end{itemize}
|
|
|
|
And there's the usual list of minor bugfixes, minor memory leaks,
|
|
docstring edits, and other tweaks, too lengthy to be worth itemizing;
|
|
see the CVS logs for the full details if you want them.
|
|
|
|
|
|
%======================================================================
|
|
\section{Acknowledgements}
|
|
|
|
The author would like to thank the following people for offering
|
|
suggestions on various drafts of this article: Graeme Cross, David
|
|
Goodger, Jay Graves, Michael Hudson, Marc-Andr\'e Lemburg, Fredrik
|
|
Lundh, Neil Schemenauer, Thomas Wouters.
|
|
|
|
\end{document}
|