943 lines
39 KiB
TeX
943 lines
39 KiB
TeX
\documentclass{howto}
|
|
|
|
% $Id$
|
|
|
|
\title{What's New in Python 2.2}
|
|
\release{0.05}
|
|
\author{A.M. Kuchling}
|
|
\authoraddress{\email{akuchlin@mems-exchange.org}}
|
|
\begin{document}
|
|
\maketitle\tableofcontents
|
|
|
|
\section{Introduction}
|
|
|
|
{\large This document is a draft, and is subject to change until the
|
|
final version of Python 2.2 is released. Currently it's up to date
|
|
for Python 2.2 alpha 1. Please send any comments, bug reports, or
|
|
questions, no matter how minor, to \email{akuchlin@mems-exchange.org}.
|
|
}
|
|
|
|
This article explains the new features in Python 2.2.
|
|
|
|
Python 2.2 can be thought of as the "cleanup release". There are some
|
|
features such as generators and iterators that are completely new, but
|
|
most of the changes, significant and far-reaching though they may be,
|
|
are aimed at cleaning up irregularities and dark corners of the
|
|
language design.
|
|
|
|
This article doesn't attempt to provide a complete specification of
|
|
the new features, but instead provides a convenient overview. For
|
|
full details, you should refer to the documentation for Python 2.2,
|
|
such as the
|
|
\citetitle[http://python.sourceforge.net/devel-docs/lib/lib.html]{Python
|
|
Library Reference} and the
|
|
\citetitle[http://python.sourceforge.net/devel-docs/ref/ref.html]{Python
|
|
Reference Manual}.
|
|
% XXX These \citetitle marks should get the python.org URLs for the final
|
|
% release, just as soon as the docs are published there.
|
|
If you want to understand the complete implementation and design
|
|
rationale for a change, refer to the PEP for a particular new feature.
|
|
|
|
|
|
The final release of Python 2.2 is planned for October 2001.
|
|
|
|
\begin{seealso}
|
|
|
|
\url{http://www.unixreview.com/documents/s=1356/urm0109h/0109h.htm}
|
|
{``What's So Special About Python 2.2?'' is also about the new 2.2
|
|
features, and was written by Cameron Laird and Kathryn Soraiz.}
|
|
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{PEP 252: Type and Class Changes}
|
|
|
|
XXX I need to read and digest the relevant PEPs.
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{252}{Making Types Look More Like Classes}{Written and implemented
|
|
by Guido van Rossum.}
|
|
|
|
\seeurl{http://www.python.org/2.2/descrintro.html}{A tutorial
|
|
on the type/class changes in 2.2.}
|
|
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{PEP 234: Iterators}
|
|
|
|
A significant addition to 2.2 is an iteration interface at both the C
|
|
and Python levels. Objects can define how they can be looped over by
|
|
callers.
|
|
|
|
In Python versions up to 2.1, the usual way to make \code{for item in
|
|
obj} work is to define a \method{__getitem__()} method that looks
|
|
something like this:
|
|
|
|
\begin{verbatim}
|
|
def __getitem__(self, index):
|
|
return <next item>
|
|
\end{verbatim}
|
|
|
|
\method{__getitem__()} is more properly used to define an indexing
|
|
operation on an object so that you can write \code{obj[5]} to retrieve
|
|
the sixth element. It's a bit misleading when you're using this only
|
|
to support \keyword{for} loops. Consider some file-like object that
|
|
wants to be looped over; the \var{index} parameter is essentially
|
|
meaningless, as the class probably assumes that a series of
|
|
\method{__getitem__()} calls will be made, with \var{index}
|
|
incrementing by one each time. In other words, the presence of the
|
|
\method{__getitem__()} method doesn't mean that \code{file[5]} will
|
|
work, though it really should.
|
|
|
|
In Python 2.2, iteration can be implemented separately, and
|
|
\method{__getitem__()} methods can be limited to classes that really
|
|
do support random access. The basic idea of iterators is quite
|
|
simple. A new built-in function, \function{iter(obj)}, returns an
|
|
iterator for the object \var{obj}. (It can also take two arguments:
|
|
\code{iter(\var{C}, \var{sentinel})} will call the callable \var{C},
|
|
until it returns \var{sentinel}, which will signal that the iterator
|
|
is done. This form probably won't be used very often.)
|
|
|
|
Python classes can define an \method{__iter__()} method, which should
|
|
create and return a new iterator for the object; if the object is its
|
|
own iterator, this method can just return \code{self}. In particular,
|
|
iterators will usually be their own iterators. Extension types
|
|
implemented in C can implement a \code{tp_iter} function in order to
|
|
return an iterator, and extension types that want to behave as
|
|
iterators can define a \code{tp_iternext} function.
|
|
|
|
So what do iterators do? They have one required method,
|
|
\method{next()}, which takes no arguments and returns the next value.
|
|
When there are no more values to be returned, calling \method{next()}
|
|
should raise the \exception{StopIteration} exception.
|
|
|
|
\begin{verbatim}
|
|
>>> L = [1,2,3]
|
|
>>> i = iter(L)
|
|
>>> print i
|
|
<iterator object at 0x8116870>
|
|
>>> i.next()
|
|
1
|
|
>>> i.next()
|
|
2
|
|
>>> i.next()
|
|
3
|
|
>>> i.next()
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in ?
|
|
StopIteration
|
|
>>>
|
|
\end{verbatim}
|
|
|
|
In 2.2, Python's \keyword{for} statement no longer expects a sequence;
|
|
it expects something for which \function{iter()} will return something.
|
|
For backward compatibility, and convenience, an iterator is
|
|
automatically constructed for sequences that don't implement
|
|
\method{__iter__()} or a \code{tp_iter} slot, so \code{for i in
|
|
[1,2,3]} will still work. Wherever the Python interpreter loops over
|
|
a sequence, it's been changed to use the iterator protocol. This
|
|
means you can do things like this:
|
|
|
|
\begin{verbatim}
|
|
>>> i = iter(L)
|
|
>>> a,b,c = i
|
|
>>> a,b,c
|
|
(1, 2, 3)
|
|
>>>
|
|
\end{verbatim}
|
|
|
|
Iterator support has been added to some of Python's basic types.
|
|
Calling \function{iter()} on a dictionary will return an iterator
|
|
which loops over its keys:
|
|
|
|
\begin{verbatim}
|
|
>>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
|
|
... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
|
|
>>> for key in m: print key, m[key]
|
|
...
|
|
Mar 3
|
|
Feb 2
|
|
Aug 8
|
|
Sep 9
|
|
May 5
|
|
Jun 6
|
|
Jul 7
|
|
Jan 1
|
|
Apr 4
|
|
Nov 11
|
|
Dec 12
|
|
Oct 10
|
|
>>>
|
|
\end{verbatim}
|
|
|
|
That's just the default behaviour. If you want to iterate over keys,
|
|
values, or key/value pairs, you can explicitly call the
|
|
\method{iterkeys()}, \method{itervalues()}, or \method{iteritems()}
|
|
methods to get an appropriate iterator. In a minor related change,
|
|
the \keyword{in} operator now works on dictionaries, so
|
|
\code{\var{key} in dict} is now equivalent to
|
|
\code{dict.has_key(\var{key})}.
|
|
|
|
|
|
Files also provide an iterator, which calls the \method{readline()}
|
|
method until there are no more lines in the file. This means you can
|
|
now read each line of a file using code like this:
|
|
|
|
\begin{verbatim}
|
|
for line in file:
|
|
# do something for each line
|
|
\end{verbatim}
|
|
|
|
Note that you can only go forward in an iterator; there's no way to
|
|
get the previous element, reset the iterator, or make a copy of it.
|
|
An iterator object could provide such additional capabilities, but the
|
|
iterator protocol only requires a \method{next()} method.
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{234}{Iterators}{Written by Ka-Ping Yee and GvR; implemented
|
|
by the Python Labs crew, mostly by GvR and Tim Peters.}
|
|
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{PEP 255: Simple Generators}
|
|
|
|
Generators are another new feature, one that interacts with the
|
|
introduction of iterators.
|
|
|
|
You're doubtless familiar with how function calls work in Python or
|
|
C. When you call a function, it gets a private area where its local
|
|
variables are created. When the function reaches a \keyword{return}
|
|
statement, the local variables are destroyed and the resulting value
|
|
is returned to the caller. A later call to the same function will get
|
|
a fresh new set of local variables. But, what if the local variables
|
|
weren't destroyed on exiting a function? What if you could later
|
|
resume the function where it left off? This is what generators
|
|
provide; they can be thought of as resumable functions.
|
|
|
|
Here's the simplest example of a generator function:
|
|
|
|
\begin{verbatim}
|
|
def generate_ints(N):
|
|
for i in range(N):
|
|
yield i
|
|
\end{verbatim}
|
|
|
|
A new keyword, \keyword{yield}, was introduced for generators. Any
|
|
function containing a \keyword{yield} statement is a generator
|
|
function; this is detected by Python's bytecode compiler which
|
|
compiles the function specially. Because a new keyword was
|
|
introduced, generators must be explicitly enabled in a module by
|
|
including a \code{from __future__ import generators} statement near
|
|
the top of the module's source code. In Python 2.3 this statement
|
|
will become unnecessary.
|
|
|
|
When you call a generator function, it doesn't return a single value;
|
|
instead it returns a generator object that supports the iterator
|
|
interface. On executing the \keyword{yield} statement, the generator
|
|
outputs the value of \code{i}, similar to a \keyword{return}
|
|
statement. The big difference between \keyword{yield} and a
|
|
\keyword{return} statement is that, on reaching a \keyword{yield} the
|
|
generator's state of execution is suspended and local variables are
|
|
preserved. On the next call to the generator's \code{.next()} method,
|
|
the function will resume executing immediately after the
|
|
\keyword{yield} statement. (For complicated reasons, the
|
|
\keyword{yield} statement isn't allowed inside the \keyword{try} block
|
|
of a \code{try...finally} statement; read PEP 255 for a full
|
|
explanation of the interaction between \keyword{yield} and
|
|
exceptions.)
|
|
|
|
Here's a sample usage of the \function{generate_ints} generator:
|
|
|
|
\begin{verbatim}
|
|
>>> gen = generate_ints(3)
|
|
>>> gen
|
|
<generator object at 0x8117f90>
|
|
>>> gen.next()
|
|
0
|
|
>>> gen.next()
|
|
1
|
|
>>> gen.next()
|
|
2
|
|
>>> gen.next()
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in ?
|
|
File "<stdin>", line 2, in generate_ints
|
|
StopIteration
|
|
>>>
|
|
\end{verbatim}
|
|
|
|
You could equally write \code{for i in generate_ints(5)}, or
|
|
\code{a,b,c = generate_ints(3)}.
|
|
|
|
Inside a generator function, the \keyword{return} statement can only
|
|
be used without a value, and signals the end of the procession of
|
|
values; afterwards the generator cannot return any further values.
|
|
\keyword{return} with a value, such as \code{return 5}, is a syntax
|
|
error inside a generator function. The end of the generator's results
|
|
can also be indicated by raising \exception{StopIteration} manually,
|
|
or by just letting the flow of execution fall off the bottom of the
|
|
function.
|
|
|
|
You could achieve the effect of generators manually by writing your
|
|
own class and storing all the local variables of the generator as
|
|
instance variables. For example, returning a list of integers could
|
|
be done by setting \code{self.count} to 0, and having the
|
|
\method{next()} method increment \code{self.count} and return it.
|
|
However, for a moderately complicated generator, writing a
|
|
corresponding class would be much messier.
|
|
\file{Lib/test/test_generators.py} contains a number of more
|
|
interesting examples. The simplest one implements an in-order
|
|
traversal of a tree using generators recursively.
|
|
|
|
\begin{verbatim}
|
|
# A recursive generator that generates Tree leaves in in-order.
|
|
def inorder(t):
|
|
if t:
|
|
for x in inorder(t.left):
|
|
yield x
|
|
yield t.label
|
|
for x in inorder(t.right):
|
|
yield x
|
|
\end{verbatim}
|
|
|
|
Two other examples in \file{Lib/test/test_generators.py} produce
|
|
solutions for the N-Queens problem (placing $N$ queens on an $NxN$
|
|
chess board so that no queen threatens another) and the Knight's Tour
|
|
(a route that takes a knight to every square of an $NxN$ chessboard
|
|
without visiting any square twice).
|
|
|
|
The idea of generators comes from other programming languages,
|
|
especially Icon (\url{http://www.cs.arizona.edu/icon/}), where the
|
|
idea of generators is central to the language. In Icon, every
|
|
expression and function call behaves like a generator. One example
|
|
from ``An Overview of the Icon Programming Language'' at
|
|
\url{http://www.cs.arizona.edu/icon/docs/ipd266.htm} gives an idea of
|
|
what this looks like:
|
|
|
|
\begin{verbatim}
|
|
sentence := "Store it in the neighboring harbor"
|
|
if (i := find("or", sentence)) > 5 then write(i)
|
|
\end{verbatim}
|
|
|
|
The \function{find()} function returns the indexes at which the
|
|
substring ``or'' is found: 3, 23, 33. In the \keyword{if} statement,
|
|
\code{i} is first assigned a value of 3, but 3 is less than 5, so the
|
|
comparison fails, and Icon retries it with the second value of 23. 23
|
|
is greater than 5, so the comparison now succeeds, and the code prints
|
|
the value 23 to the screen.
|
|
|
|
Python doesn't go nearly as far as Icon in adopting generators as a
|
|
central concept. Generators are considered a new part of the core
|
|
Python language, but learning or using them isn't compulsory; if they
|
|
don't solve any problems that you have, feel free to ignore them.
|
|
This is different from Icon where the idea of generators is a basic
|
|
concept. One novel feature of Python's interface as compared to
|
|
Icon's is that a generator's state is represented as a concrete object
|
|
that can be passed around to other functions or stored in a data
|
|
structure.
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{255}{Simple Generators}{Written by Neil Schemenauer, Tim
|
|
Peters, Magnus Lie Hetland. Implemented mostly by Neil Schemenauer
|
|
and Tim Peters, with other fixes from the Python Labs crew.}
|
|
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{PEP 237: Unifying Long Integers and Integers}
|
|
|
|
In recent versions, the distinction between regular integers, which
|
|
are 32-bit values on most machines, and long integers, which can be of
|
|
arbitrary size, was becoming an annoyance. For example, on platforms
|
|
that support large files (files larger than \code{2**32} bytes), the
|
|
\method{tell()} method of file objects has to return a long integer.
|
|
However, there were various bits of Python that expected plain
|
|
integers and would raise an error if a long integer was provided
|
|
instead. For example, in Python 1.5, only regular integers
|
|
could be used as a slice index, and \code{'abc'[1L:]} would raise a
|
|
\exception{TypeError} exception with the message 'slice index must be
|
|
int'.
|
|
|
|
Python 2.2 will shift values from short to long integers as required.
|
|
The 'L' suffix is no longer needed to indicate a long integer literal,
|
|
as now the compiler will choose the appropriate type. (Using the 'L'
|
|
suffix will be discouraged in future 2.x versions of Python,
|
|
triggering a warning in Python 2.4, and probably dropped in Python
|
|
3.0.) Many operations that used to raise an \exception{OverflowError}
|
|
will now return a long integer as their result. For example:
|
|
|
|
\begin{verbatim}
|
|
>>> 1234567890123
|
|
1234567890123L
|
|
>>> 2 ** 64
|
|
18446744073709551616L
|
|
\end{verbatim}
|
|
|
|
In most cases, integers and long integers will now be treated
|
|
identically. You can still distinguish them with the
|
|
\function{type()} built-in function, but that's rarely needed. The
|
|
\function{int()} function will now return a long integer if the value
|
|
is large enough.
|
|
|
|
% XXX is there a warning-enabling command-line option for this?
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{237}{Unifying Long Integers and Integers}{Written by
|
|
Moshe Zadka and Guido van Rossum. Implemented mostly by Guido van Rossum.}
|
|
|
|
\end{seealso}
|
|
|
|
%======================================================================
|
|
\section{PEP 238: Changing the Division Operator}
|
|
|
|
The most controversial change in Python 2.2 is the start of an effort
|
|
to fix an old design flaw that's been in Python from the beginning.
|
|
Currently Python's division operator, \code{/}, behaves like C's
|
|
division operator when presented with two integer arguments. It
|
|
returns an integer result that's truncated down when there would be
|
|
fractional part. For example, \code{3/2} is 1, not 1.5, and
|
|
\code{(-1)/2} is -1, not -0.5. This means that the results of divison
|
|
can vary unexpectedly depending on the type of the two operands and
|
|
because Python is dynamically typed, it can be difficult to determine
|
|
the possible types of the operands.
|
|
|
|
(The controversy is over whether this is \emph{really} a design flaw,
|
|
and whether it's worth breaking existing code to fix this. It's
|
|
caused endless discussions on python-dev and in July erupted into an
|
|
storm of acidly sarcastic postings on \newsgroup{comp.lang.python}. I
|
|
won't argue for either side here; read PEP 238 for a summary of
|
|
arguments and counter-arguments.)
|
|
|
|
Because this change might break code, it's being introduced very
|
|
gradually. Python 2.2 begins the transition, but the switch won't be
|
|
complete until Python 3.0.
|
|
|
|
First, some terminology from PEP 238. ``True division'' is the
|
|
division that most non-programmers are familiar with: 3/2 is 1.5, 1/4
|
|
is 0.25, and so forth. ``Floor division'' is what Python's \code{/}
|
|
operator currently does when given integer operands; the result is the
|
|
floor of the value returned by true division. ``Classic division'' is
|
|
the current mixed behaviour of \code{/}; it returns the result of
|
|
floor division when the operands are integers, and returns the result
|
|
of true division when one of the operands is a floating-point number.
|
|
|
|
Here are the changes 2.2 introduces:
|
|
|
|
\begin{itemize}
|
|
|
|
\item A new operator, \code{//}, is the floor division operator.
|
|
(Yes, we know it looks like \Cpp's comment symbol.) \code{//}
|
|
\emph{always} returns the floor divison no matter what the types of
|
|
its operands are, so \code{1 // 2} is 0 and \code{1.0 // 2.0} is also
|
|
0.0.
|
|
|
|
\code{//} is always available in Python 2.2; you don't need to enable
|
|
it using a \code{__future__} statement.
|
|
|
|
\item By including a \code{from __future__ import true_division} in a
|
|
module, the \code{/} operator will be changed to return the result of
|
|
true division, so \code{1/2} is 0.5. Without the \code{__future__}
|
|
statement, \code{/} still means classic division. The default meaning
|
|
of \code{/} will not change until Python 3.0.
|
|
|
|
\item Classes can define methods called \method{__truediv__} and
|
|
\method{__floordiv__} to overload the two division operators. At the
|
|
C level, there are also slots in the \code{PyNumberMethods} structure
|
|
so extension types can define the two operators.
|
|
|
|
% XXX a warning someday?
|
|
|
|
\end{itemize}
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{238}{Changing the Division Operator}{Written by Moshe Zadka and
|
|
Guido van Rossum. Implemented by Guido van Rossum..}
|
|
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{Unicode Changes}
|
|
|
|
Python's Unicode support has been enhanced a bit in 2.2. Unicode
|
|
strings are usually stored as UCS-2, as 16-bit unsigned integers.
|
|
Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned
|
|
integers, as its internal encoding by supplying
|
|
\longprogramopt{enable-unicode=ucs4} to the configure script. When
|
|
built to use UCS-4 (a ``wide Python''), the interpreter can natively
|
|
handle Unicode characters from U+000000 to U+110000, so the range of
|
|
legal values for the \function{unichr()} function is expanded
|
|
accordingly. Using an interpreter compiled to use UCS-2 (a ``narrow
|
|
Python''), values greater than 65535 will still cause
|
|
\function{unichr()} to raise a \exception{ValueError} exception.
|
|
|
|
All this is the province of the still-unimplemented PEP 261, ``Support
|
|
for `wide' Unicode characters''; consult it for further details, and
|
|
please offer comments on the PEP and on your experiences with the
|
|
2.2 alpha releases.
|
|
% XXX update previous line once 2.2 reaches beta.
|
|
|
|
Another change is much simpler to explain. Since their introduction,
|
|
Unicode strings have supported an \method{encode()} method to convert
|
|
the string to a selected encoding such as UTF-8 or Latin-1. A
|
|
symmetric \method{decode(\optional{\var{encoding}})} method has been
|
|
added to 8-bit strings (though not to Unicode strings) in 2.2.
|
|
\method{decode()} assumes that the string is in the specified encoding
|
|
and decodes it, returning whatever is returned by the codec.
|
|
|
|
Using this new feature, codecs have been added for tasks not directly
|
|
related to Unicode. For example, codecs have been added for
|
|
uu-encoding, MIME's base64 encoding, and compression with the
|
|
\module{zlib} module:
|
|
|
|
\begin{verbatim}
|
|
>>> s = """Here is a lengthy piece of redundant, overly verbose,
|
|
... and repetitive text.
|
|
... """
|
|
>>> data = s.encode('zlib')
|
|
>>> data
|
|
'x\x9c\r\xc9\xc1\r\x80 \x10\x04\xc0?Ul...'
|
|
>>> data.decode('zlib')
|
|
'Here is a lengthy piece of redundant, overly verbose,\nand repetitive text.\n'
|
|
>>> print s.encode('uu')
|
|
begin 666 <data>
|
|
M2&5R92!I<R!A(&QE;F=T:'D@<&EE8V4@;V8@<F5D=6YD86YT+"!O=F5R;'D@
|
|
>=F5R8F]S92P*86YD(')E<&5T:71I=F4@=&5X="X*
|
|
|
|
end
|
|
>>> "sheesh".encode('rot-13')
|
|
'furrfu'
|
|
\end{verbatim}
|
|
|
|
\method{encode()} and \method{decode()} were implemented by
|
|
Marc-Andr\'e Lemburg. The changes to support using UCS-4 internally
|
|
were implemented by Fredrik Lundh and Martin von L\"owis.
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{261}{Support for `wide' Unicode characters}{PEP written by
|
|
Paul Prescod. Not yet accepted or fully implemented.}
|
|
|
|
\end{seealso}
|
|
|
|
%======================================================================
|
|
\section{PEP 227: Nested Scopes}
|
|
|
|
In Python 2.1, statically nested scopes were added as an optional
|
|
feature, to be enabled by a \code{from __future__ import
|
|
nested_scopes} directive. In 2.2 nested scopes no longer need to be
|
|
specially enabled, but are always enabled. The rest of this section
|
|
is a copy of the description of nested scopes from my ``What's New in
|
|
Python 2.1'' document; if you read it when 2.1 came out, you can skip
|
|
the rest of this section.
|
|
|
|
The largest change introduced in Python 2.1, and made complete in 2.2,
|
|
is to Python's scoping rules. In Python 2.0, at any given time there
|
|
are at most three namespaces used to look up variable names: local,
|
|
module-level, and the built-in namespace. This often surprised people
|
|
because it didn't match their intuitive expectations. For example, a
|
|
nested recursive function definition doesn't work:
|
|
|
|
\begin{verbatim}
|
|
def f():
|
|
...
|
|
def g(value):
|
|
...
|
|
return g(value-1) + 1
|
|
...
|
|
\end{verbatim}
|
|
|
|
The function \function{g()} will always raise a \exception{NameError}
|
|
exception, because the binding of the name \samp{g} isn't in either
|
|
its local namespace or in the module-level namespace. This isn't much
|
|
of a problem in practice (how often do you recursively define interior
|
|
functions like this?), but this also made using the \keyword{lambda}
|
|
statement clumsier, and this was a problem in practice. In code which
|
|
uses \keyword{lambda} you can often find local variables being copied
|
|
by passing them as the default values of arguments.
|
|
|
|
\begin{verbatim}
|
|
def find(self, name):
|
|
"Return list of any entries equal to 'name'"
|
|
L = filter(lambda x, name=name: x == name,
|
|
self.list_attribute)
|
|
return L
|
|
\end{verbatim}
|
|
|
|
The readability of Python code written in a strongly functional style
|
|
suffers greatly as a result.
|
|
|
|
The most significant change to Python 2.2 is that static scoping has
|
|
been added to the language to fix this problem. As a first effect,
|
|
the \code{name=name} default argument is now unnecessary in the above
|
|
example. Put simply, when a given variable name is not assigned a
|
|
value within a function (by an assignment, or the \keyword{def},
|
|
\keyword{class}, or \keyword{import} statements), references to the
|
|
variable will be looked up in the local namespace of the enclosing
|
|
scope. A more detailed explanation of the rules, and a dissection of
|
|
the implementation, can be found in the PEP.
|
|
|
|
This change may cause some compatibility problems for code where the
|
|
same variable name is used both at the module level and as a local
|
|
variable within a function that contains further function definitions.
|
|
This seems rather unlikely though, since such code would have been
|
|
pretty confusing to read in the first place.
|
|
|
|
One side effect of the change is that the \code{from \var{module}
|
|
import *} and \keyword{exec} statements have been made illegal inside
|
|
a function scope under certain conditions. The Python reference
|
|
manual has said all along that \code{from \var{module} import *} is
|
|
only legal at the top level of a module, but the CPython interpreter
|
|
has never enforced this before. As part of the implementation of
|
|
nested scopes, the compiler which turns Python source into bytecodes
|
|
has to generate different code to access variables in a containing
|
|
scope. \code{from \var{module} import *} and \keyword{exec} make it
|
|
impossible for the compiler to figure this out, because they add names
|
|
to the local namespace that are unknowable at compile time.
|
|
Therefore, if a function contains function definitions or
|
|
\keyword{lambda} expressions with free variables, the compiler will
|
|
flag this by raising a \exception{SyntaxError} exception.
|
|
|
|
To make the preceding explanation a bit clearer, here's an example:
|
|
|
|
\begin{verbatim}
|
|
x = 1
|
|
def f():
|
|
# The next line is a syntax error
|
|
exec 'x=2'
|
|
def g():
|
|
return x
|
|
\end{verbatim}
|
|
|
|
Line 4 containing the \keyword{exec} statement is a syntax error,
|
|
since \keyword{exec} would define a new local variable named \samp{x}
|
|
whose value should be accessed by \function{g()}.
|
|
|
|
This shouldn't be much of a limitation, since \keyword{exec} is rarely
|
|
used in most Python code (and when it is used, it's often a sign of a
|
|
poor design anyway).
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{227}{Statically Nested Scopes}{Written and implemented by
|
|
Jeremy Hylton.}
|
|
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{New and Improved Modules}
|
|
|
|
\begin{itemize}
|
|
|
|
\item The \module{xmlrpclib} module was contributed to the standard
|
|
library by Fredrik Lundh. It provides support for writing XML-RPC
|
|
clients; XML-RPC is a simple remote procedure call protocol built on
|
|
top of HTTP and XML. For example, the following snippet retrieves a
|
|
list of RSS channels from the O'Reilly Network, and then retrieves a
|
|
list of the recent headlines for one channel:
|
|
|
|
\begin{verbatim}
|
|
import xmlrpclib
|
|
s = xmlrpclib.Server(
|
|
'http://www.oreillynet.com/meerkat/xml-rpc/server.php')
|
|
channels = s.meerkat.getChannels()
|
|
# channels is a list of dictionaries, like this:
|
|
# [{'id': 4, 'title': 'Freshmeat Daily News'}
|
|
# {'id': 190, 'title': '32Bits Online'},
|
|
# {'id': 4549, 'title': '3DGamers'}, ... ]
|
|
|
|
# Get the items for one channel
|
|
items = s.meerkat.getItems( {'channel': 4} )
|
|
|
|
# 'items' is another list of dictionaries, like this:
|
|
# [{'link': 'http://freshmeat.net/releases/52719/',
|
|
# 'description': 'A utility which converts HTML to XSL FO.',
|
|
# 'title': 'html2fo 0.3 (Default)'}, ... ]
|
|
\end{verbatim}
|
|
|
|
See \url{http://www.xmlrpc.com/} for more information about XML-RPC.
|
|
|
|
\item The \module{socket} module can be compiled to support IPv6;
|
|
specify the \longprogramopt{enable-ipv6} option to Python's configure
|
|
script. (Contributed by Jun-ichiro ``itojun'' Hagino.)
|
|
|
|
\item Two new format characters were added to the \module{struct}
|
|
module for 64-bit integers on platforms that support the C
|
|
\ctype{long long} type. \samp{q} is for a signed 64-bit integer,
|
|
and \samp{Q} is for an unsigned one. The value is returned in
|
|
Python's long integer type. (Contributed by Tim Peters.)
|
|
|
|
\item In the interpreter's interactive mode, there's a new built-in
|
|
function \function{help()}, that uses the \module{pydoc} module
|
|
introduced in Python 2.1 to provide interactive.
|
|
\code{help(\var{object})} displays any available help text about
|
|
\var{object}. \code{help()} with no argument puts you in an online
|
|
help utility, where you can enter the names of functions, classes,
|
|
or modules to read their help text.
|
|
(Contributed by Guido van Rossum, using Ka-Ping Yee's \module{pydoc} module.)
|
|
|
|
\item Various bugfixes and performance improvements have been made
|
|
to the SRE engine underlying the \module{re} module. For example,
|
|
\function{re.sub()} will now use \function{string.replace()}
|
|
automatically when the pattern and its replacement are both just
|
|
literal strings without regex metacharacters. Another contributed
|
|
patch speeds up certain Unicode character ranges by a factor of
|
|
two. (SRE is maintained by Fredrik Lundh. The BIGCHARSET patch was
|
|
contributed by Martin von L\"owis.)
|
|
|
|
\item The \module{smtplib} module now supports \rfc{2487}, ``Secure
|
|
SMTP over TLS'', so it's now possible to encrypt the SMTP traffic
|
|
between a Python program and the mail transport agent being handed a
|
|
message. (Contributed by Gerhard H\"aring.)
|
|
|
|
\item The \module{imaplib} module, maintained by Piers Lauder, has
|
|
support for several new extensions: the NAMESPACE extension defined
|
|
in \rfc{2342}, SORT, GETACL and SETACL. (Contributed by Anthony
|
|
Baxter and Michel Pelletier.)
|
|
|
|
\item The \module{rfc822} module's parsing of email addresses is
|
|
now compliant with \rfc{2822}, an update to \rfc{822}. The module's
|
|
name is \emph{not} going to be changed to \samp{rfc2822}.
|
|
(Contributed by Barry Warsaw.)
|
|
|
|
\item New constants \constant{ascii_letters},
|
|
\constant{ascii_lowercase}, and \constant{ascii_uppercase} were
|
|
added to the \module{string} module. There were several modules in
|
|
the standard library that used \constant{string.letters} to mean the
|
|
ranges A-Za-z, but that assumption is incorrect when locales are in
|
|
use, because \constant{string.letters} varies depending on the set
|
|
of legal characters defined by the current locale. The buggy
|
|
modules have all been fixed to use \constant{ascii_letters} instead.
|
|
(Reported by an unknown person; fixed by Fred L. Drake, Jr.)
|
|
|
|
\item The \module{mimetypes} module now makes it easier to use
|
|
alternative MIME-type databases by the addition of a
|
|
\class{MimeTypes} class, which takes a list of filenames to be
|
|
parsed. (Contributed by Fred L. Drake, Jr.)
|
|
|
|
\item A \class{Timer} class was added to the \module{threading}
|
|
module that allows scheduling an activity to happen at some future
|
|
time. (Contributed by Itamar Shtull-Trauring.)
|
|
|
|
\end{itemize}
|
|
|
|
|
|
%======================================================================
|
|
\section{Interpreter Changes and Fixes}
|
|
|
|
Some of the changes only affect people who deal with the Python
|
|
interpreter at the C level, writing Python extension modules,
|
|
embedding the interpreter, or just hacking on the interpreter itself.
|
|
If you only write Python code, none of the changes described here will
|
|
affect you very much.
|
|
|
|
\begin{itemize}
|
|
|
|
\item Profiling and tracing functions can now be implemented in C,
|
|
which can operate at much higher speeds than Python-based functions
|
|
and should reduce the overhead of enabling profiling and tracing, so
|
|
it will be of interest to authors of development environments for
|
|
Python. Two new C functions were added to Python's API,
|
|
\cfunction{PyEval_SetProfile()} and \cfunction{PyEval_SetTrace()}.
|
|
The existing \function{sys.setprofile()} and
|
|
\function{sys.settrace()} functions still exist, and have simply
|
|
been changed to use the new C-level interface. (Contributed by Fred
|
|
L. Drake, Jr.)
|
|
|
|
\item Another low-level API, primarily of interest to implementors
|
|
of Python debuggers and development tools, was added.
|
|
\cfunction{PyInterpreterState_Head()} and
|
|
\cfunction{PyInterpreterState_Next()} let a caller walk through all
|
|
the existing interpreter objects;
|
|
\cfunction{PyInterpreterState_ThreadHead()} and
|
|
\cfunction{PyThreadState_Next()} allow looping over all the thread
|
|
states for a given interpreter. (Contributed by David Beazley.)
|
|
|
|
\item A new \samp{et} format sequence was added to
|
|
\cfunction{PyArg_ParseTuple}; \samp{et} takes both a parameter and
|
|
an encoding name, and converts the parameter to the given encoding
|
|
if the parameter turns out to be a Unicode string, or leaves it
|
|
alone if it's an 8-bit string, assuming it to already be in the
|
|
desired encoding. This differs from the \samp{es} format character,
|
|
which assumes that 8-bit strings are in Python's default ASCII
|
|
encoding and converts them to the specified new encoding.
|
|
(Contributed by M.-A. Lemburg.)
|
|
|
|
\item Two new flags \constant{METH_NOARGS} and \constant{METH_O} are
|
|
available in method definition tables to simplify implementation of
|
|
methods with no arguments or a single untyped argument. Calling
|
|
such methods is more efficient than calling a corresponding method
|
|
that uses \constant{METH_VARARGS}.
|
|
Also, the old \constant{METH_OLDARGS} style of writing C methods is
|
|
now officially deprecated.
|
|
|
|
\item
|
|
Two new wrapper functions, \cfunction{PyOS_snprintf()} and
|
|
\cfunction{PyOS_vsnprintf()} were added. which provide a
|
|
cross-platform implementations for the relatively new
|
|
\cfunction{snprintf()} and \cfunction{vsnprintf()} C lib APIs. In
|
|
contrast to the standard \cfunction{sprintf()} and
|
|
\cfunction{vsprintf()} functions, the Python versions check the
|
|
bounds of the buffer used to protect against buffer overruns.
|
|
(Contributed by M.-A. Lemburg.)
|
|
|
|
\end{itemize}
|
|
|
|
|
|
%======================================================================
|
|
\section{Other Changes and Fixes}
|
|
|
|
% XXX update the patch and bug figures as we go
|
|
As usual there were a bunch of other improvements and bugfixes
|
|
scattered throughout the source tree. A search through the CVS change
|
|
logs finds there were 119 patches applied, and 179 bugs fixed; both
|
|
figures are likely to be underestimates. Some of the more notable
|
|
changes are:
|
|
|
|
\begin{itemize}
|
|
|
|
\item The code for the MacOS port for Python, maintained by Jack
|
|
Jansen, is now kept in the main Python CVS tree, and many changes
|
|
have been made to support MacOS X.
|
|
|
|
The most significant change is the ability to build Python as a
|
|
framework, enabled by supplying the \longprogramopt{enable-framework}
|
|
option to the configure script when compiling Python. According to
|
|
Jack Jansen, ``This installs a self-contained Python installation plus
|
|
the OSX framework "glue" into
|
|
\file{/Library/Frameworks/Python.framework} (or another location of
|
|
choice). For now there is little immediate added benefit to this
|
|
(actually, there is the disadvantage that you have to change your PATH
|
|
to be able to find Python), but it is the basis for creating a
|
|
full-blown Python application, porting the MacPython IDE, possibly
|
|
using Python as a standard OSA scripting language and much more.''
|
|
|
|
Most of the MacPython toolbox modules, which interface to MacOS APIs
|
|
such as windowing, QuickTime, scripting, etc. have been ported to OS
|
|
X, but they've been left commented out in setup.py. People who want
|
|
to experiment with these modules can uncomment them manually.
|
|
|
|
% Jack's original comments:
|
|
%The main change is the possibility to build Python as a
|
|
%framework. This installs a self-contained Python installation plus the
|
|
%OSX framework "glue" into /Library/Frameworks/Python.framework (or
|
|
%another location of choice). For now there is little immedeate added
|
|
%benefit to this (actually, there is the disadvantage that you have to
|
|
%change your PATH to be able to find Python), but it is the basis for
|
|
%creating a fullblown Python application, porting the MacPython IDE,
|
|
%possibly using Python as a standard OSA scripting language and much
|
|
%more. You enable this with "configure --enable-framework".
|
|
|
|
%The other change is that most MacPython toolbox modules, which
|
|
%interface to all the MacOS APIs such as windowing, quicktime,
|
|
%scripting, etc. have been ported. Again, most of these are not of
|
|
%immedeate use, as they need a full application to be really useful, so
|
|
%they have been commented out in setup.py. People wanting to experiment
|
|
%can uncomment them. Gestalt and Internet Config modules are enabled by
|
|
%default.
|
|
|
|
|
|
\item Keyword arguments passed to builtin functions that don't take them
|
|
now cause a \exception{TypeError} exception to be raised, with the
|
|
message "\var{function} takes no keyword arguments".
|
|
|
|
\item A new script, \file{Tools/scripts/cleanfuture.py} by Tim
|
|
Peters, automatically removes obsolete \code{__future__} statements
|
|
from Python source code.
|
|
|
|
\item The new license introduced with Python 1.6 wasn't
|
|
GPL-compatible. This is fixed by some minor textual changes to the
|
|
2.2 license, so Python can now be embedded inside a GPLed program
|
|
again. The license changes were also applied to the Python 2.0.1
|
|
and 2.1.1 releases.
|
|
|
|
\item When presented with a Unicode filename on Windows, Python will
|
|
now convert it to an MBCS encoded string, as used by the Microsoft
|
|
file APIs. As MBCS is explicitly used by the file APIs, Python's
|
|
choice of ASCII as the default encoding turns out to be an
|
|
annoyance.
|
|
(Contributed by Mark Hammond with assistance from Marc-Andr\'e
|
|
Lemburg.)
|
|
|
|
\item Large file support is now enabled on Windows. (Contributed by
|
|
Tim Peters.)
|
|
|
|
\item The \file{Tools/scripts/ftpmirror.py} script
|
|
now parses a \file{.netrc} file, if you have one.
|
|
(Contributed by Mike Romberg.)
|
|
|
|
\item Some features of the object returned by the
|
|
\function{xrange()} function are now deprecated, and trigger
|
|
warnings when they're accessed; they'll disappear in Python 2.3.
|
|
\class{xrange} objects tried to pretend they were full sequence
|
|
types by supporting slicing, sequence multiplication, and the
|
|
\keyword{in} operator, but these features were rarely used and
|
|
therefore buggy. The \method{tolist()} method and the
|
|
\member{start}, \member{stop}, and \member{step} attributes are also
|
|
being deprecated. At the C level, the fourth argument to the
|
|
\cfunction{PyRange_New()} function, \samp{repeat}, has also been
|
|
deprecated.
|
|
|
|
\item There were a bunch of patches to the dictionary
|
|
implementation, mostly to fix potential core dumps if a dictionary
|
|
contains objects that sneakily changed their hash value, or mutated
|
|
the dictionary they were contained in. For a while python-dev fell
|
|
into a gentle rhythm of Michael Hudson finding a case that dump
|
|
core, Tim Peters fixing it, Michael finding another case, and round
|
|
and round it went.
|
|
|
|
\item On Windows, Python can now be compiled with Borland C thanks
|
|
to a number of patches contributed by Stephen Hansen, though the
|
|
result isn't fully functional yet. (But this \emph{is} progress...)
|
|
|
|
\item Another Windows enhancement: Wise Solutions generously offered
|
|
PythonLabs use of their InstallerMaster 8.1 system. Earlier
|
|
PythonLabs Windows installers used Wise 5.0a, which was beginning to
|
|
show its age. (Packaged up by Tim Peters.)
|
|
|
|
\item Files ending in \samp{.pyw} can now be imported on Windows.
|
|
\samp{.pyw} is a Windows-only thing, used to indicate that a script
|
|
needs to be run using PYTHONW.EXE instead of PYTHON.EXE in order to
|
|
prevent a DOS console from popping up to display the output. This
|
|
patch makes it possible to import such scripts, in case they're also
|
|
usable as modules. (Implemented by David Bolen.)
|
|
|
|
\item On platforms where Python uses the C \cfunction{dlopen()} function
|
|
to load extension modules, it's now possible to set the flags used
|
|
by \cfunction{dlopen()} using the \function{sys.getdlopenflags()} and
|
|
\function{sys.setdlopenflags()} functions. (Contributed by Bram Stolk.)
|
|
|
|
\item The \function{pow()} built-in function no longer supports 3
|
|
arguments when floating-point numbers are supplied.
|
|
\code{pow(\var{x}, \var{y}, \var{z})} returns \code{(x**y) \% z}, but
|
|
this is never useful for floating point numbers, and the final
|
|
result varies unpredictably depending on the platform. A call such
|
|
as \code{pow(2.0, 8.0, 7.0)} will now raise a \exception{TypeError}
|
|
exception.
|
|
|
|
\end{itemize}
|
|
|
|
|
|
%======================================================================
|
|
\section{Acknowledgements}
|
|
|
|
The author would like to thank the following people for offering
|
|
suggestions and corrections to various drafts of this article: Fred
|
|
Bremmer, Keith Briggs, Fred L. Drake, Jr., Carel Fellinger, Mark
|
|
Hammond, Stephen Hansen, Jack Jansen, Marc-Andr\'e Lemburg, Tim Peters, Neil
|
|
Schemenauer, Guido van Rossum.
|
|
|
|
\end{document}
|