699 lines
28 KiB
TeX
699 lines
28 KiB
TeX
\documentclass{howto}
|
|
|
|
% $Id$
|
|
|
|
\title{What's New in Python 2.2}
|
|
\release{0.04}
|
|
\author{A.M. Kuchling}
|
|
\authoraddress{\email{akuchlin@mems-exchange.org}}
|
|
\begin{document}
|
|
\maketitle\tableofcontents
|
|
|
|
\section{Introduction}
|
|
|
|
{\large This document is a draft, and is subject to change until the
|
|
final version of Python 2.2 is released. Currently it's not up to
|
|
date at all. Please send any comments, bug reports, or questions, no
|
|
matter how minor, to \email{akuchlin@mems-exchange.org}. }
|
|
|
|
This article explains the new features in Python 2.2. Python 2.2
|
|
includes some significant changes that go far toward cleaning up the
|
|
language's darkest corners, and some exciting new features.
|
|
|
|
This article doesn't attempt to provide a complete specification for
|
|
the new features, but instead provides a convenient overview of the
|
|
new features. For full details, you should refer to 2.2 documentation
|
|
such as the
|
|
\citetitle[http://python.sourceforge.net/devel-docs/lib/lib.html]{Python
|
|
Library Reference} and the
|
|
\citetitle[http://python.sourceforge.net/devel-docs/ref/ref.html]{Python
|
|
Reference Manual}, or to the PEP for a particular new feature.
|
|
% These \citetitle marks should get the python.org URLs for the final
|
|
% release, just as soon as the docs are published there.
|
|
|
|
The final release of Python 2.2 is planned for October 2001.
|
|
|
|
|
|
%======================================================================
|
|
% It looks like this set of changes will likely get into 2.2,
|
|
% so I need to read and digest the relevant PEPs.
|
|
%\section{PEP 252: Type and Class Changes}
|
|
|
|
%XXX
|
|
|
|
% GvR's description at http://www.python.org/2.2/descrintro.html
|
|
|
|
%\begin{seealso}
|
|
|
|
%\seepep{252}{Making Types Look More Like Classes}{Written and implemented
|
|
%by GvR.}
|
|
|
|
%\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{PEP 234: Iterators}
|
|
|
|
A significant addition to 2.2 is an iteration interface at both the C
|
|
and Python levels. Objects can define how they can be looped over by
|
|
callers.
|
|
|
|
In Python versions up to 2.1, the usual way to make \code{for item in
|
|
obj} work is to define a \method{__getitem__()} method that looks
|
|
something like this:
|
|
|
|
\begin{verbatim}
|
|
def __getitem__(self, index):
|
|
return <next item>
|
|
\end{verbatim}
|
|
|
|
\method{__getitem__()} is more properly used to define an indexing
|
|
operation on an object so that you can write \code{obj[5]} to retrieve
|
|
the fifth element. It's a bit misleading when you're using this only
|
|
to support \keyword{for} loops. Consider some file-like object that
|
|
wants to be looped over; the \var{index} parameter is essentially
|
|
meaningless, as the class probably assumes that a series of
|
|
\method{__getitem__()} calls will be made, with \var{index}
|
|
incrementing by one each time. In other words, the presence of the
|
|
\method{__getitem__()} method doesn't mean that \code{file[5]} will
|
|
work, though it really should.
|
|
|
|
In Python 2.2, iteration can be implemented separately, and
|
|
\method{__getitem__()} methods can be limited to classes that really
|
|
do support random access. The basic idea of iterators is quite
|
|
simple. A new built-in function, \function{iter(obj)}, returns an
|
|
iterator for the object \var{obj}. (It can also take two arguments:
|
|
\code{iter(\var{C}, \var{sentinel})} will call the callable \var{C},
|
|
until it returns \var{sentinel}, which will signal that the iterator
|
|
is done. This form probably won't be used very often.)
|
|
|
|
Python classes can define an \method{__iter__()} method, which should
|
|
create and return a new iterator for the object; if the object is its
|
|
own iterator, this method can just return \code{self}. In particular,
|
|
iterators will usually be their own iterators. Extension types
|
|
implemented in C can implement a \code{tp_iter} function in order to
|
|
return an iterator, and extension types that want to behave as
|
|
iterators can define a \code{tp_iternext} function.
|
|
|
|
So what do iterators do? They have one required method,
|
|
\method{next()}, which takes no arguments and returns the next value.
|
|
When there are no more values to be returned, calling \method{next()}
|
|
should raise the \exception{StopIteration} exception.
|
|
|
|
\begin{verbatim}
|
|
>>> L = [1,2,3]
|
|
>>> i = iter(L)
|
|
>>> print i
|
|
<iterator object at 0x8116870>
|
|
>>> i.next()
|
|
1
|
|
>>> i.next()
|
|
2
|
|
>>> i.next()
|
|
3
|
|
>>> i.next()
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in ?
|
|
StopIteration
|
|
>>>
|
|
\end{verbatim}
|
|
|
|
In 2.2, Python's \keyword{for} statement no longer expects a sequence;
|
|
it expects something for which \function{iter()} will return something.
|
|
For backward compatibility, and convenience, an iterator is
|
|
automatically constructed for sequences that don't implement
|
|
\method{__iter__()} or a \code{tp_iter} slot, so \code{for i in
|
|
[1,2,3]} will still work. Wherever the Python interpreter loops over
|
|
a sequence, it's been changed to use the iterator protocol. This
|
|
means you can do things like this:
|
|
|
|
\begin{verbatim}
|
|
>>> i = iter(L)
|
|
>>> a,b,c = i
|
|
>>> a,b,c
|
|
(1, 2, 3)
|
|
>>>
|
|
\end{verbatim}
|
|
|
|
Iterator support has been added to some of Python's basic types. The
|
|
\keyword{in} operator now works on dictionaries, so \code{\var{key} in
|
|
dict} is now equivalent to \code{dict.has_key(\var{key})}.
|
|
Calling \function{iter()} on a dictionary will return an iterator
|
|
which loops over its keys:
|
|
|
|
\begin{verbatim}
|
|
>>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
|
|
... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
|
|
>>> for key in m: print key, m[key]
|
|
...
|
|
Mar 3
|
|
Feb 2
|
|
Aug 8
|
|
Sep 9
|
|
May 5
|
|
Jun 6
|
|
Jul 7
|
|
Jan 1
|
|
Apr 4
|
|
Nov 11
|
|
Dec 12
|
|
Oct 10
|
|
>>>
|
|
\end{verbatim}
|
|
|
|
That's just the default behaviour. If you want to iterate over keys,
|
|
values, or key/value pairs, you can explicitly call the
|
|
\method{iterkeys()}, \method{itervalues()}, or \method{iteritems()}
|
|
methods to get an appropriate iterator.
|
|
|
|
Files also provide an iterator, which calls its \method{readline()}
|
|
method until there are no more lines in the file. This means you can
|
|
now read each line of a file using code like this:
|
|
|
|
\begin{verbatim}
|
|
for line in file:
|
|
# do something for each line
|
|
\end{verbatim}
|
|
|
|
Note that you can only go forward in an iterator; there's no way to
|
|
get the previous element, reset the iterator, or make a copy of it.
|
|
An iterator object could provide such additional capabilities, but the
|
|
iterator protocol only requires a \method{next()} method.
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{234}{Iterators}{Written by Ka-Ping Yee and GvR; implemented
|
|
by the Python Labs crew, mostly by GvR and Tim Peters.}
|
|
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{PEP 255: Simple Generators}
|
|
|
|
Generators are another new feature, one that interacts with the
|
|
introduction of iterators.
|
|
|
|
You're doubtless familiar with how function calls work in Python or
|
|
C. When you call a function, it gets a private area where its local
|
|
variables are created. When the function reaches a \keyword{return}
|
|
statement, the local variables are destroyed and the resulting value
|
|
is returned to the caller. A later call to the same function will get
|
|
a fresh new set of local variables. But, what if the local variables
|
|
weren't destroyed on exiting a function? What if you could later
|
|
resume the function where it left off? This is what generators
|
|
provide; they can be thought of as resumable functions.
|
|
|
|
Here's the simplest example of a generator function:
|
|
|
|
\begin{verbatim}
|
|
def generate_ints(N):
|
|
for i in range(N):
|
|
yield i
|
|
\end{verbatim}
|
|
|
|
A new keyword, \keyword{yield}, was introduced for generators. Any
|
|
function containing a \keyword{yield} statement is a generator
|
|
function; this is detected by Python's bytecode compiler which
|
|
compiles the function specially. Because a new keyword was
|
|
introduced, generators must be explicitly enabled in a module by
|
|
including a \code{from __future__ import generators} statement near
|
|
the top of the module's source code. In Python 2.3 this statement
|
|
will become unnecessary.
|
|
|
|
When you call a generator function, it doesn't return a single value;
|
|
instead it returns a generator object that supports the iterator
|
|
interface. On executing the \keyword{yield} statement, the generator
|
|
outputs the value of \code{i}, similar to a \keyword{return}
|
|
statement. The big difference between \keyword{yield} and a
|
|
\keyword{return} statement is that, on reaching a \keyword{yield} the
|
|
generator's state of execution is suspended and local variables are
|
|
preserved. On the next call to the generator's \code{.next()} method,
|
|
the function will resume executing immediately after the
|
|
\keyword{yield} statement. (For complicated reasons, the
|
|
\keyword{yield} statement isn't allowed inside the \keyword{try} block
|
|
of a \code{try...finally} statement; read PEP 255 for a full
|
|
explanation of the interaction between \keyword{yield} and
|
|
exceptions.)
|
|
|
|
Here's a sample usage of the \function{generate_ints} generator:
|
|
|
|
\begin{verbatim}
|
|
>>> gen = generate_ints(3)
|
|
>>> gen
|
|
<generator object at 0x8117f90>
|
|
>>> gen.next()
|
|
0
|
|
>>> gen.next()
|
|
1
|
|
>>> gen.next()
|
|
2
|
|
>>> gen.next()
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in ?
|
|
File "<stdin>", line 2, in generate_ints
|
|
StopIteration
|
|
>>>
|
|
\end{verbatim}
|
|
|
|
You could equally write \code{for i in generate_ints(5)}, or
|
|
\code{a,b,c = generate_ints(3)}.
|
|
|
|
Inside a generator function, the \keyword{return} statement can only
|
|
be used without a value, and signals the end of the procession of
|
|
values; afterwards the generator cannot return any further values.
|
|
\keyword{return} with a value, such as \code{return 5}, is a syntax
|
|
error inside a generator function. The end of the generator's results
|
|
can also be indicated by raising \exception{StopIteration} manually,
|
|
or by just letting the flow of execution fall off the bottom of the
|
|
function.
|
|
|
|
You could achieve the effect of generators manually by writing your
|
|
own class and storing all the local variables of the generator as
|
|
instance variables. For example, returning a list of integers could
|
|
be done by setting \code{self.count} to 0, and having the
|
|
\method{next()} method increment \code{self.count} and return it.
|
|
However, for a moderately complicated generator, writing a
|
|
corresponding class would be much messier.
|
|
\file{Lib/test/test_generators.py} contains a number of more
|
|
interesting examples. The simplest one implements an in-order
|
|
traversal of a tree using generators recursively.
|
|
|
|
\begin{verbatim}
|
|
# A recursive generator that generates Tree leaves in in-order.
|
|
def inorder(t):
|
|
if t:
|
|
for x in inorder(t.left):
|
|
yield x
|
|
yield t.label
|
|
for x in inorder(t.right):
|
|
yield x
|
|
\end{verbatim}
|
|
|
|
Two other examples in \file{Lib/test/test_generators.py} produce
|
|
solutions for the N-Queens problem (placing $N$ queens on an $NxN$
|
|
chess board so that no queen threatens another) and the Knight's Tour
|
|
(a route that takes a knight to every square of an $NxN$ chessboard
|
|
without visiting any square twice).
|
|
|
|
The idea of generators comes from other programming languages,
|
|
especially Icon (\url{http://www.cs.arizona.edu/icon/}), where the
|
|
idea of generators is central to the language. In Icon, every
|
|
expression and function call behaves like a generator. One example
|
|
from ``An Overview of the Icon Programming Language'' at
|
|
\url{http://www.cs.arizona.edu/icon/docs/ipd266.htm} gives an idea of
|
|
what this looks like:
|
|
|
|
\begin{verbatim}
|
|
sentence := "Store it in the neighboring harbor"
|
|
if (i := find("or", sentence)) > 5 then write(i)
|
|
\end{verbatim}
|
|
|
|
The \function{find()} function returns the indexes at which the
|
|
substring ``or'' is found: 3, 23, 33. In the \keyword{if} statement,
|
|
\code{i} is first assigned a value of 3, but 3 is less than 5, so the
|
|
comparison fails, and Icon retries it with the second value of 23. 23
|
|
is greater than 5, so the comparison now succeeds, and the code prints
|
|
the value 23 to the screen.
|
|
|
|
Python doesn't go nearly as far as Icon in adopting generators as a
|
|
central concept. Generators are considered a new part of the core
|
|
Python language, but learning or using them isn't compulsory; if they
|
|
don't solve any problems that you have, feel free to ignore them.
|
|
This is different from Icon where the idea of generators is a basic
|
|
concept. One novel feature of Python's interface as compared to
|
|
Icon's is that a generator's state is represented as a concrete object
|
|
that can be passed around to other functions or stored in a data
|
|
structure.
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{255}{Simple Generators}{Written by Neil Schemenauer, Tim
|
|
Peters, Magnus Lie Hetland. Implemented mostly by Neil Schemenauer
|
|
and Tim Peters, with other fixes from the Python Labs crew.}
|
|
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{Unicode Changes}
|
|
|
|
Python's Unicode support has been enhanced a bit in 2.2. Unicode
|
|
strings are usually stored as UCS-2, as 16-bit unsigned integers.
|
|
Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned
|
|
integers, as its internal encoding by supplying
|
|
\longprogramopt{enable-unicode=ucs4} to the configure script. When
|
|
built to use UCS-4 (a ``wide Python''), the interpreter can natively
|
|
handle Unicode characters from U+000000 to U+110000, so the range of
|
|
legal values for the \function{unichr()} function is expanded
|
|
accordingly. Using an interpreter compiled to use UCS-2 (a ``narrow
|
|
Python''), values greater than 65535 will still cause
|
|
\function{unichr()} to raise a \exception{ValueError} exception.
|
|
|
|
All this is the province of the still-unimplemented PEP 261, ``Support
|
|
for `wide' Unicode characters''; consult it for further details, and
|
|
please offer comments on the PEP and on your experiences with the
|
|
2.2 alpha releases.
|
|
% XXX update previous line once 2.2 reaches beta.
|
|
|
|
Another change is much simpler to explain. Since their introduction,
|
|
Unicode strings have supported an \method{encode()} method to convert
|
|
the string to a selected encoding such as UTF-8 or Latin-1. A
|
|
symmetric \method{decode(\optional{\var{encoding}})} method has been
|
|
added to 8-bit strings (though not to Unicode strings) in 2.2.
|
|
\method{decode()} assumes that the string is in the specified encoding
|
|
and decodes it, returning whatever is returned by the codec.
|
|
|
|
Using this new feature, codecs have been added for tasks not directly
|
|
related to Unicode. For example, codecs have been added for
|
|
uu-encoding, MIME's base64 encoding, and compression with the
|
|
\module{zlib} module:
|
|
|
|
\begin{verbatim}
|
|
>>> s = """Here is a lengthy piece of redundant, overly verbose,
|
|
... and repetitive text.
|
|
... """
|
|
>>> data = s.encode('zlib')
|
|
>>> data
|
|
'x\x9c\r\xc9\xc1\r\x80 \x10\x04\xc0?Ul...'
|
|
>>> data.decode('zlib')
|
|
'Here is a lengthy piece of redundant, overly verbose,\nand repetitive text.\n'
|
|
>>> print s.encode('uu')
|
|
begin 666 <data>
|
|
M2&5R92!I<R!A(&QE;F=T:'D@<&EE8V4@;V8@<F5D=6YD86YT+"!O=F5R;'D@
|
|
>=F5R8F]S92P*86YD(')E<&5T:71I=F4@=&5X="X*
|
|
|
|
end
|
|
>>> "sheesh".encode('rot-13')
|
|
'furrfu'
|
|
\end{verbatim}
|
|
|
|
\method{encode()} and \method{decode()} were implemented by
|
|
Marc-Andr\'e Lemburg. The changes to support using UCS-4 internally
|
|
were implemented by Fredrik Lundh and Martin von L\"owis.
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{261}{Support for `wide' Unicode characters}{PEP written by
|
|
Paul Prescod. Not yet accepted or fully implemented.}
|
|
|
|
\end{seealso}
|
|
|
|
%======================================================================
|
|
\section{PEP 227: Nested Scopes}
|
|
|
|
In Python 2.1, statically nested scopes were added as an optional
|
|
feature, to be enabled by a \code{from __future__ import
|
|
nested_scopes} directive. In 2.2 nested scopes no longer need to be
|
|
specially enabled, but are always enabled. The rest of this section
|
|
is a copy of the description of nested scopes from my ``What's New in
|
|
Python 2.1'' document; if you read it when 2.1 came out, you can skip
|
|
the rest of this section.
|
|
|
|
The largest change introduced in Python 2.1, and made complete in 2.2,
|
|
is to Python's scoping rules. In Python 2.0, at any given time there
|
|
are at most three namespaces used to look up variable names: local,
|
|
module-level, and the built-in namespace. This often surprised people
|
|
because it didn't match their intuitive expectations. For example, a
|
|
nested recursive function definition doesn't work:
|
|
|
|
\begin{verbatim}
|
|
def f():
|
|
...
|
|
def g(value):
|
|
...
|
|
return g(value-1) + 1
|
|
...
|
|
\end{verbatim}
|
|
|
|
The function \function{g()} will always raise a \exception{NameError}
|
|
exception, because the binding of the name \samp{g} isn't in either
|
|
its local namespace or in the module-level namespace. This isn't much
|
|
of a problem in practice (how often do you recursively define interior
|
|
functions like this?), but this also made using the \keyword{lambda}
|
|
statement clumsier, and this was a problem in practice. In code which
|
|
uses \keyword{lambda} you can often find local variables being copied
|
|
by passing them as the default values of arguments.
|
|
|
|
\begin{verbatim}
|
|
def find(self, name):
|
|
"Return list of any entries equal to 'name'"
|
|
L = filter(lambda x, name=name: x == name,
|
|
self.list_attribute)
|
|
return L
|
|
\end{verbatim}
|
|
|
|
The readability of Python code written in a strongly functional style
|
|
suffers greatly as a result.
|
|
|
|
The most significant change to Python 2.2 is that static scoping has
|
|
been added to the language to fix this problem. As a first effect,
|
|
the \code{name=name} default argument is now unnecessary in the above
|
|
example. Put simply, when a given variable name is not assigned a
|
|
value within a function (by an assignment, or the \keyword{def},
|
|
\keyword{class}, or \keyword{import} statements), references to the
|
|
variable will be looked up in the local namespace of the enclosing
|
|
scope. A more detailed explanation of the rules, and a dissection of
|
|
the implementation, can be found in the PEP.
|
|
|
|
This change may cause some compatibility problems for code where the
|
|
same variable name is used both at the module level and as a local
|
|
variable within a function that contains further function definitions.
|
|
This seems rather unlikely though, since such code would have been
|
|
pretty confusing to read in the first place.
|
|
|
|
One side effect of the change is that the \code{from \var{module}
|
|
import *} and \keyword{exec} statements have been made illegal inside
|
|
a function scope under certain conditions. The Python reference
|
|
manual has said all along that \code{from \var{module} import *} is
|
|
only legal at the top level of a module, but the CPython interpreter
|
|
has never enforced this before. As part of the implementation of
|
|
nested scopes, the compiler which turns Python source into bytecodes
|
|
has to generate different code to access variables in a containing
|
|
scope. \code{from \var{module} import *} and \keyword{exec} make it
|
|
impossible for the compiler to figure this out, because they add names
|
|
to the local namespace that are unknowable at compile time.
|
|
Therefore, if a function contains function definitions or
|
|
\keyword{lambda} expressions with free variables, the compiler will
|
|
flag this by raising a \exception{SyntaxError} exception.
|
|
|
|
To make the preceding explanation a bit clearer, here's an example:
|
|
|
|
\begin{verbatim}
|
|
x = 1
|
|
def f():
|
|
# The next line is a syntax error
|
|
exec 'x=2'
|
|
def g():
|
|
return x
|
|
\end{verbatim}
|
|
|
|
Line 4 containing the \keyword{exec} statement is a syntax error,
|
|
since \keyword{exec} would define a new local variable named \samp{x}
|
|
whose value should be accessed by \function{g()}.
|
|
|
|
This shouldn't be much of a limitation, since \keyword{exec} is rarely
|
|
used in most Python code (and when it is used, it's often a sign of a
|
|
poor design anyway).
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{227}{Statically Nested Scopes}{Written and implemented by
|
|
Jeremy Hylton.}
|
|
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{New and Improved Modules}
|
|
|
|
\begin{itemize}
|
|
|
|
\item The \module{xmlrpclib} module was contributed to the standard
|
|
library by Fredrik Lundh. It provides support for writing XML-RPC
|
|
clients; XML-RPC is a simple remote procedure call protocol built on
|
|
top of HTTP and XML. For example, the following snippet retrieves a
|
|
list of RSS channels from the O'Reilly Network, and then retrieves a
|
|
list of the recent headlines for one channel:
|
|
|
|
\begin{verbatim}
|
|
import xmlrpclib
|
|
s = xmlrpclib.Server(
|
|
'http://www.oreillynet.com/meerkat/xml-rpc/server.php')
|
|
channels = s.meerkat.getChannels()
|
|
# channels is a list of dictionaries, like this:
|
|
# [{'id': 4, 'title': 'Freshmeat Daily News'}
|
|
# {'id': 190, 'title': '32Bits Online'},
|
|
# {'id': 4549, 'title': '3DGamers'}, ... ]
|
|
|
|
# Get the items for one channel
|
|
items = s.meerkat.getItems( {'channel': 4} )
|
|
|
|
# 'items' is another list of dictionaries, like this:
|
|
# [{'link': 'http://freshmeat.net/releases/52719/',
|
|
# 'description': 'A utility which converts HTML to XSL FO.',
|
|
# 'title': 'html2fo 0.3 (Default)'}, ... ]
|
|
\end{verbatim}
|
|
|
|
See \url{http://www.xmlrpc.com/} for more information about XML-RPC.
|
|
|
|
\item The \module{socket} module can be compiled to support IPv6;
|
|
specify the \longprogramopt{enable-ipv6} option to Python's configure
|
|
script. (Contributed by Jun-ichiro ``itojun'' Hagino.)
|
|
|
|
\item Two new format characters were added to the \module{struct}
|
|
module for 64-bit integers on platforms that support the C
|
|
\ctype{long long} type. \samp{q} is for a signed 64-bit integer,
|
|
and \samp{Q} is for an unsigned one. The value is returned in
|
|
Python's long integer type. (Contributed by Tim Peters.)
|
|
|
|
\item In the interpreter's interactive mode, there's a new built-in
|
|
function \function{help()}, that uses the \module{pydoc} module
|
|
introduced in Python 2.1 to provide interactive.
|
|
\code{help(\var{object})} displays any available help text about
|
|
\var{object}. \code{help()} with no argument puts you in an online
|
|
help utility, where you can enter the names of functions, classes,
|
|
or modules to read their help text.
|
|
(Contributed by Guido van Rossum, using Ka-Ping Yee's \module{pydoc} module.)
|
|
|
|
\item Various bugfixes and performance improvements have been made
|
|
to the SRE engine underlying the \module{re} module. For example,
|
|
\function{re.sub()} will now use \function{string.replace()}
|
|
automatically when the pattern and its replacement are both just
|
|
literal strings without regex metacharacters. Another contributed
|
|
patch speeds up certain Unicode character ranges by a factor of
|
|
two. (SRE is maintained by Fredrik Lundh. The BIGCHARSET patch was
|
|
contributed by Martin von L\"owis.)
|
|
|
|
\item The \module{imaplib} module, maintained by Piers Lauder, has
|
|
support for several new extensions: the NAMESPACE extension defined
|
|
in \rfc{2342}, SORT, GETACL and SETACL. (Contributed by Anthony
|
|
Baxter and Michel Pelletier.)
|
|
|
|
\item The \module{rfc822} module's parsing of email addresses is
|
|
now compliant with \rfc{2822}, an update to \rfc{822}. The module's
|
|
name is \emph{not} going to be changed to \samp{rfc2822}.
|
|
(Contributed by Barry Warsaw.)
|
|
|
|
\end{itemize}
|
|
|
|
|
|
%======================================================================
|
|
\section{Other Changes and Fixes}
|
|
|
|
% XXX update the patch and bug figures as we go
|
|
As usual there were a bunch of other improvements and bugfixes
|
|
scattered throughout the source tree. A search through the CVS change
|
|
logs finds there were 43 patches applied, and 77 bugs fixed; both
|
|
figures are likely to be underestimates. Some of the more notable
|
|
changes are:
|
|
|
|
\begin{itemize}
|
|
|
|
\item Keyword arguments passed to builtin functions that don't take them
|
|
now cause a \exception{TypeError} exception to be raised, with the
|
|
message "\var{function} takes no keyword arguments".
|
|
|
|
\item The code for the Mac OS port for Python, maintained by Jack
|
|
Jansen, is now kept in the main Python CVS tree.
|
|
|
|
\item The new license introduced with Python 1.6 wasn't
|
|
GPL-compatible. This is fixed by some minor textual changes to the
|
|
2.2 license, so Python can now be embedded inside a GPLed program
|
|
again. The license changes were also applied to the Python 2.0.1
|
|
and 2.1.1 releases.
|
|
|
|
\item Profiling and tracing functions can now be implemented in C,
|
|
which can operate at much higher speeds than Python-based functions
|
|
and should reduce the overhead of enabling profiling and tracing, so
|
|
it will be of interest to authors of development environments for
|
|
Python. Two new C functions were added to Python's API,
|
|
\cfunction{PyEval_SetProfile()} and \cfunction{PyEval_SetTrace()}.
|
|
The existing \function{sys.setprofile()} and
|
|
\function{sys.settrace()} functions still exist, and have simply
|
|
been changed to use the new C-level interface. (Contributed by Fred
|
|
L. Drake, Jr.)
|
|
|
|
\item Another low-level API, primarily of interest to implementors
|
|
of Python debuggers and development tools, was added.
|
|
\cfunction{PyInterpreterState_Head()} and
|
|
\cfunction{PyInterpreterState_Next()} let a caller walk through all
|
|
the existing interpreter objects;
|
|
\cfunction{PyInterpreterState_ThreadHead()} and
|
|
\cfunction{PyThreadState_Next()} allow looping over all the thread
|
|
states for a given interpreter. (Contributed by David Beazley.)
|
|
|
|
% XXX is this explanation correct?
|
|
\item When presented with a Unicode filename on Windows, Python will
|
|
now correctly convert it to a string using the MBCS encoding.
|
|
Filenames on Windows are a case where Python's choice of ASCII as
|
|
the default encoding turns out to be an annoyance.
|
|
|
|
\item When presented with a Unicode filename on Windows, Python will
|
|
now convert it to an MBCS encoded string, as used by the Microsoft
|
|
file APIs. As MBCS is explicitly used by the file APIs, Python's
|
|
choice of ASCII as the default encoding turns out to be an
|
|
annoyance.
|
|
|
|
This patch also adds \samp{et} as a format sequence to
|
|
\cfunction{PyArg_ParseTuple}; \samp{et} takes both a parameter and
|
|
an encoding name, and converts it to the given encoding if the
|
|
parameter turns out to be a Unicode string, or leaves it alone if
|
|
it's an 8-bit string, assuming it to already be in the desired
|
|
encoding. (This differs from the \samp{es} format character, which
|
|
assumes that 8-bit strings are in Python's default ASCII encoding
|
|
and converts them to the specified new encoding.)
|
|
|
|
(Contributed by Mark Hammond with assistance from Marc-Andr\'e
|
|
Lemburg.)
|
|
|
|
\item The \file{Tools/scripts/ftpmirror.py} script
|
|
now parses a \file{.netrc} file, if you have one.
|
|
(Contributed by Mike Romberg.)
|
|
|
|
\item Some features of the object returned by the
|
|
\function{xrange()} function are now deprecated, and trigger
|
|
warnings when they're accessed; they'll disappear in Python 2.3.
|
|
\class{xrange} objects tried to pretend they were full sequence
|
|
types by supporting slicing, sequence multiplication, and the
|
|
\keyword{in} operator, but these features were rarely used and
|
|
therefore buggy. The \method{tolist()} method and the
|
|
\member{start}, \member{stop}, and \member{step} attributes are also
|
|
being deprecated. At the C level, the fourth argument to the
|
|
\cfunction{PyRange_New()} function, \samp{repeat}, has also been
|
|
deprecated.
|
|
|
|
\item There were a bunch of patches to the dictionary
|
|
implementation, mostly to fix potential core dumps if a dictionary
|
|
contains objects that sneakily changed their hash value, or mutated
|
|
the dictionary they were contained in. For a while python-dev fell
|
|
into a gentle rhythm of Michael Hudson finding a case that dump
|
|
core, Tim Peters fixing it, Michael finding another case, and round
|
|
and round it went.
|
|
|
|
\item On Windows, Python can now be compiled with Borland C thanks
|
|
to a number of patches contributed by Stephen Hansen.
|
|
|
|
\item Another Windows enhancement: Wise Solutions generously offered
|
|
PythonLabs use of their InstallerMaster 8.1 system. Earlier
|
|
PythonLabs Windows installers used Wise 5.0a, which was beginning to
|
|
show its age. (Packaged up by Tim Peters.)
|
|
|
|
\item On platforms where Python uses the C \cfunction{dlopen()} function
|
|
to load extension modules, it's now possible to set the flags used
|
|
by \cfunction{dlopen()} using the \function{sys.getdlopenflags()} and
|
|
\function{sys.setdlopenflags()} functions. (Contributed by Bram Stolk.)
|
|
|
|
\end{itemize}
|
|
|
|
|
|
%======================================================================
|
|
\section{Acknowledgements}
|
|
|
|
The author would like to thank the following people for offering
|
|
suggestions and corrections to various drafts of this article: Fred
|
|
Bremmer, Fred L. Drake, Jr., Mark Hammond, Marc-Andr\'e Lemburg,
|
|
Tim Peters, Neil Schemenauer, Guido van Rossum.
|
|
|
|
\end{document}
|