999 lines
44 KiB
TeX
999 lines
44 KiB
TeX
\documentclass{howto}
|
|
|
|
\title{What's New in Python 2.0}
|
|
\release{0.05}
|
|
\author{A.M. Kuchling and Moshe Zadka}
|
|
\authoraddress{\email{amk1@bigfoot.com}, \email{moshez@math.huji.ac.il} }
|
|
\begin{document}
|
|
\maketitle\tableofcontents
|
|
|
|
\section{Introduction}
|
|
|
|
{\large This is a draft document; please report inaccuracies and
|
|
omissions to the authors. This document should not be treated as
|
|
definitive; features described here might be removed or changed during
|
|
the beta cycle before the final release of Python 2.0.
|
|
}
|
|
|
|
A new release of Python, version 2.0, will be released some time this
|
|
summer. Beta versions are already available from
|
|
\url{http://www.pythonlabs.com/tech/python2.html}. This article
|
|
covers the exciting new features in 2.0, highlights some other useful
|
|
changes, and points out a few incompatible changes that may require
|
|
rewriting code.
|
|
|
|
Python's development never completely stops between releases, and a
|
|
steady flow of bug fixes and improvements are always being submitted.
|
|
A host of minor fixes, a few optimizations, additional docstrings, and
|
|
better error messages went into 2.0; to list them all would be
|
|
impossible, but they're certainly significant. Consult the
|
|
publicly-available CVS logs if you want to see the full list.
|
|
|
|
% ======================================================================
|
|
\section{Unicode}
|
|
|
|
The largest new feature in Python 2.0 is a new fundamental data type:
|
|
Unicode strings. Unicode uses 16-bit numbers to represent characters
|
|
instead of the 8-bit number used by ASCII, meaning that 65,536
|
|
distinct characters can be supported.
|
|
|
|
The final interface for Unicode support was arrived at through
|
|
countless often-stormy discussions on the python-dev mailing list, and
|
|
mostly implemented by Marc-Andr\'e Lemburg, based on a Unicode string
|
|
type implementation by Fredrik Lundh. A detailed explanation of the
|
|
interface is in the file \file{Misc/unicode.txt} in the Python source
|
|
distribution; it's also available on the Web at
|
|
\url{http://starship.python.net/crew/lemburg/unicode-proposal.txt}.
|
|
This article will simply cover the most significant points from the
|
|
full interface.
|
|
|
|
In Python source code, Unicode strings are written as
|
|
\code{u"string"}. Arbitrary Unicode characters can be written using a
|
|
new escape sequence, \code{\e u\var{HHHH}}, where \var{HHHH} is a
|
|
4-digit hexadecimal number from 0000 to FFFF. The existing
|
|
\code{\e x\var{HHHH}} escape sequence can also be used, and octal
|
|
escapes can be used for characters up to U+01FF, which is represented
|
|
by \code{\e 777}.
|
|
|
|
Unicode strings, just like regular strings, are an immutable sequence
|
|
type. They can be indexed and sliced, but not modified in place.
|
|
Unicode strings have an \method{encode( \optional{encoding} )} method
|
|
that returns an 8-bit string in the desired encoding. Encodings are
|
|
named by strings, such as \code{'ascii'}, \code{'utf-8'},
|
|
\code{'iso-8859-1'}, or whatever. A codec API is defined for
|
|
implementing and registering new encodings that are then available
|
|
throughout a Python program. If an encoding isn't specified, the
|
|
default encoding is usually 7-bit ASCII, though it can be changed for
|
|
your Python installation by calling the
|
|
\function{sys.setdefaultencoding(\var{encoding})} function in a
|
|
customised version of \file{site.py}.
|
|
|
|
Combining 8-bit and Unicode strings always coerces to Unicode, using
|
|
the default ASCII encoding; the result of \code{'a' + u'bc'} is
|
|
\code{u'abc'}.
|
|
|
|
New built-in functions have been added, and existing built-ins
|
|
modified to support Unicode:
|
|
|
|
\begin{itemize}
|
|
\item \code{unichr(\var{ch})} returns a Unicode string 1 character
|
|
long, containing the character \var{ch}.
|
|
|
|
\item \code{ord(\var{u})}, where \var{u} is a 1-character regular or Unicode string, returns the number of the character as an integer.
|
|
|
|
\item \code{unicode(\var{string} \optional{, \var{encoding}}
|
|
\optional{, \var{errors}} ) } creates a Unicode string from an 8-bit
|
|
string. \code{encoding} is a string naming the encoding to use.
|
|
The \code{errors} parameter specifies the treatment of characters that
|
|
are invalid for the current encoding; passing \code{'strict'} as the
|
|
value causes an exception to be raised on any encoding error, while
|
|
\code{'ignore'} causes errors to be silently ignored and
|
|
\code{'replace'} uses U+FFFD, the official replacement character, in
|
|
case of any problems.
|
|
|
|
\end{itemize}
|
|
|
|
A new module, \module{unicodedata}, provides an interface to Unicode
|
|
character properties. For example, \code{unicodedata.category(u'A')}
|
|
returns the 2-character string 'Lu', the 'L' denoting it's a letter,
|
|
and 'u' meaning that it's uppercase.
|
|
\code{u.bidirectional(u'\e x0660')} returns 'AN', meaning that U+0660 is
|
|
an Arabic number.
|
|
|
|
The \module{codecs} module contains functions to look up existing encodings
|
|
and register new ones. Unless you want to implement a
|
|
new encoding, you'll most often use the
|
|
\function{codecs.lookup(\var{encoding})} function, which returns a
|
|
4-element tuple: \code{(\var{encode_func},
|
|
\var{decode_func}, \var{stream_reader}, \var{stream_writer})}.
|
|
|
|
\begin{itemize}
|
|
\item \var{encode_func} is a function that takes a Unicode string, and
|
|
returns a 2-tuple \code{(\var{string}, \var{length})}. \var{string}
|
|
is an 8-bit string containing a portion (perhaps all) of the Unicode
|
|
string converted into the given encoding, and \var{length} tells you
|
|
how much of the Unicode string was converted.
|
|
|
|
\item \var{decode_func} is the mirror of \var{encode_func},
|
|
taking a Unicode string and
|
|
returns a 2-tuple \code{(\var{ustring}, \var{length})} containing a Unicode string
|
|
and \var{length} telling you how much of the string was consumed.
|
|
|
|
\item \var{stream_reader} is a class that supports decoding input from
|
|
a stream. \var{stream_reader(\var{file_obj})} returns an object that
|
|
supports the \method{read()}, \method{readline()}, and
|
|
\method{readlines()} methods. These methods will all translate from
|
|
the given encoding and return Unicode strings.
|
|
|
|
\item \var{stream_writer}, similarly, is a class that supports
|
|
encoding output to a stream. \var{stream_writer(\var{file_obj})}
|
|
returns an object that supports the \method{write()} and
|
|
\method{writelines()} methods. These methods expect Unicode strings,
|
|
translating them to the given encoding on output.
|
|
\end{itemize}
|
|
|
|
For example, the following code writes a Unicode string into a file,
|
|
encoding it as UTF-8:
|
|
|
|
\begin{verbatim}
|
|
import codecs
|
|
|
|
unistr = u'\u0660\u2000ab ...'
|
|
|
|
(UTF8_encode, UTF8_decode,
|
|
UTF8_streamreader, UTF8_streamwriter) = codecs.lookup('UTF-8')
|
|
|
|
output = UTF8_streamwriter( open( '/tmp/output', 'wb') )
|
|
output.write( unistr )
|
|
output.close()
|
|
\end{verbatim}
|
|
|
|
The following code would then read UTF-8 input from the file:
|
|
|
|
\begin{verbatim}
|
|
input = UTF8_streamreader( open( '/tmp/output', 'rb') )
|
|
print repr(input.read())
|
|
input.close()
|
|
\end{verbatim}
|
|
|
|
Unicode-aware regular expressions are available through the
|
|
\module{re} module, which has a new underlying implementation called
|
|
SRE written by Fredrik Lundh of Secret Labs AB.
|
|
|
|
A \code{-U} command line option was added which causes the Python
|
|
compiler to interpret all string literals as Unicode string literals.
|
|
This is intended to be used in testing and future-proofing your Python
|
|
code, since some future version of Python may drop support for 8-bit
|
|
strings and provide only Unicode strings.
|
|
|
|
% ======================================================================
|
|
\section{List Comprehensions}
|
|
|
|
Lists are a workhorse data type in Python, and many programs
|
|
manipulate a list at some point. Two common operations on lists are
|
|
to loop over them, and either pick out the elements that meet a
|
|
certain criterion, or apply some function to each element. For
|
|
example, given a list of strings, you might want to pull out all the
|
|
strings containing a given substring, or strip off trailing whitespace
|
|
from each line.
|
|
|
|
The existing \function{map()} and \function{filter()} functions can be
|
|
used for this purpose, but they require a function as one of their
|
|
arguments. This is fine if there's an existing built-in function that
|
|
can be passed directly, but if there isn't, you have to create a
|
|
little function to do the required work, and Python's scoping rules
|
|
make the result ugly if the little function needs additional
|
|
information. Take the first example in the previous paragraph,
|
|
finding all the strings in the list containing a given substring. You
|
|
could write the following to do it:
|
|
|
|
\begin{verbatim}
|
|
# Given the list L, make a list of all strings
|
|
# containing the substring S.
|
|
sublist = filter( lambda s, substring=S:
|
|
string.find(s, substring) != -1,
|
|
L)
|
|
\end{verbatim}
|
|
|
|
Because of Python's scoping rules, a default argument is used so that
|
|
the anonymous function created by the \keyword{lambda} statement knows
|
|
what substring is being searched for. List comprehensions make this
|
|
cleaner:
|
|
|
|
\begin{verbatim}
|
|
sublist = [ s for s in L if string.find(s, S) != -1 ]
|
|
\end{verbatim}
|
|
|
|
List comprehensions have the form:
|
|
|
|
\begin{verbatim}
|
|
[ expression for expr in sequence1
|
|
for expr2 in sequence2 ...
|
|
for exprN in sequenceN
|
|
if condition
|
|
\end{verbatim}
|
|
|
|
The \keyword{for}...\keyword{in} clauses contain the sequences to be
|
|
iterated over. The sequences do not have to be the same length,
|
|
because they are \emph{not} iterated over in parallel, but
|
|
from left to right; this is explained more clearly in the following
|
|
paragraphs. The elements of the generated list will be the successive
|
|
values of \var{expression}. The final \keyword{if} clause is
|
|
optional; if present, \var{expression} is only evaluated and added to
|
|
the result if \var{condition} is true.
|
|
|
|
To make the semantics very clear, a list comprehension is equivalent
|
|
to the following Python code:
|
|
|
|
\begin{verbatim}
|
|
for expr1 in sequence1:
|
|
for expr2 in sequence2:
|
|
...
|
|
for exprN in sequenceN:
|
|
if (condition):
|
|
# Append the value of
|
|
# the expression to the
|
|
# resulting list.
|
|
\end{verbatim}
|
|
|
|
This means that when there are \keyword{for}...\keyword{in} clauses,
|
|
the resulting list will be equal to the product of the lengths of all
|
|
the sequences. If you have two lists of length 3, the output list is
|
|
9 elements long:
|
|
|
|
\begin{verbatim}
|
|
seq1 = 'abc'
|
|
seq2 = (1,2,3)
|
|
>>> [ (x,y) for x in seq1 for y in seq2]
|
|
[('a', 1), ('a', 2), ('a', 3), ('b', 1), ('b', 2), ('b', 3), ('c', 1),
|
|
('c', 2), ('c', 3)]
|
|
\end{verbatim}
|
|
|
|
To avoid introducing an ambiguity into Python's grammar, if
|
|
\var{expression} is creating a tuple, it must be surrounded with
|
|
parentheses. The first list comprehension below is a syntax error,
|
|
while the second one is correct:
|
|
|
|
\begin{verbatim}
|
|
# Syntax error
|
|
[ x,y for x in seq1 for y in seq2]
|
|
# Correct
|
|
[ (x,y) for x in seq1 for y in seq2]
|
|
\end{verbatim}
|
|
|
|
The idea of list comprehensions originally comes from the functional
|
|
programming language Haskell (\url{http://www.haskell.org}). Greg
|
|
Ewing argued most effectively for adding them to Python and wrote the
|
|
initial list comprehension patch, which was then discussed for a
|
|
seemingly endless time on the python-dev mailing list and kept
|
|
up-to-date by Skip Montanaro.
|
|
|
|
% ======================================================================
|
|
\section{Augmented Assignment}
|
|
|
|
Augmented assignment operators, another long-requested feature, have
|
|
been added to Python 2.0. Augmented assignment operators include
|
|
\code{+=}, \code{-=}, \code{*=}, and so forth. For example, the
|
|
statement \code{a += 2} increments the value of the variable
|
|
\code{a} by 2, equivalent to the slightly lengthier \code{a = a + 2}.
|
|
|
|
The full list of supported assignment operators is \code{+=},
|
|
\code{-=}, \code{*=}, \code{/=}, \code{\%=}, \code{**=}, \code{\&=},
|
|
\code{|=}, \verb|^=|, \code{>>=}, and \code{<<=}. Python classes can
|
|
override the augmented assignment operators by defining methods named
|
|
\method{__iadd__}, \method{__isub__}, etc. For example, the following
|
|
\class{Number} class stores a number and supports using += to create a
|
|
new instance with an incremented value.
|
|
|
|
\begin{verbatim}
|
|
class Number:
|
|
def __init__(self, value):
|
|
self.value = value
|
|
def __iadd__(self, increment):
|
|
return Number( self.value + increment)
|
|
|
|
n = Number(5)
|
|
n += 3
|
|
print n.value
|
|
\end{verbatim}
|
|
|
|
The \method{__iadd__} special method is called with the value of the
|
|
increment, and should return a new instance with an appropriately
|
|
modified value; this return value is bound as the new value of the
|
|
variable on the left-hand side.
|
|
|
|
Augmented assignment operators were first introduced in the C
|
|
programming language, and most C-derived languages, such as
|
|
\program{awk}, C++, Java, Perl, and PHP also support them. The augmented
|
|
assignment patch was implemented by Thomas Wouters.
|
|
|
|
% ======================================================================
|
|
\section{String Methods}
|
|
|
|
Until now string-manipulation functionality was in the \module{string}
|
|
module, which was usually a front-end for the \module{strop}
|
|
module written in C. The addition of Unicode posed a difficulty for
|
|
the \module{strop} module, because the functions would all need to be
|
|
rewritten in order to accept either 8-bit or Unicode strings. For
|
|
functions such as \function{string.replace()}, which takes 3 string
|
|
arguments, that means eight possible permutations, and correspondingly
|
|
complicated code.
|
|
|
|
Instead, Python 2.0 pushes the problem onto the string type, making
|
|
string manipulation functionality available through methods on both
|
|
8-bit strings and Unicode strings.
|
|
|
|
\begin{verbatim}
|
|
>>> 'andrew'.capitalize()
|
|
'Andrew'
|
|
>>> 'hostname'.replace('os', 'linux')
|
|
'hlinuxtname'
|
|
>>> 'moshe'.find('sh')
|
|
2
|
|
\end{verbatim}
|
|
|
|
One thing that hasn't changed, a noteworthy April Fools' joke
|
|
notwithstanding, is that Python strings are immutable. Thus, the
|
|
string methods return new strings, and do not modify the string on
|
|
which they operate.
|
|
|
|
The old \module{string} module is still around for backwards
|
|
compatibility, but it mostly acts as a front-end to the new string
|
|
methods.
|
|
|
|
Two methods which have no parallel in pre-2.0 versions, although they
|
|
did exist in JPython for quite some time, are \method{startswith()}
|
|
and \method{endswith}. \code{s.startswith(t)} is equivalent to \code{s[:len(t)]
|
|
== t}, while \code{s.endswith(t)} is equivalent to \code{s[-len(t):] == t}.
|
|
|
|
One other method which deserves special mention is \method{join}. The
|
|
\method{join} method of a string receives one parameter, a sequence of
|
|
strings, and is equivalent to the \function{string.join} function from
|
|
the old \module{string} module, with the arguments reversed. In other
|
|
words, \code{s.join(seq)} is equivalent to the old
|
|
\code{string.join(seq, s)}.
|
|
|
|
% ======================================================================
|
|
\section{Optional Collection of Cycles}
|
|
|
|
The C implementation of Python uses reference counting to implement
|
|
garbage collection. Every Python object maintains a count of the
|
|
number of references pointing to itself, and adjusts the count as
|
|
references are created or destroyed. Once the reference count reaches
|
|
zero, the object is no longer accessible, since you need to have a
|
|
reference to an object to access it, and if the count is zero, no
|
|
references exist any longer.
|
|
|
|
Reference counting has some pleasant properties: it's easy to
|
|
understand and implement, and the resulting implementation is
|
|
portable, fairly fast, and reacts well with other libraries that
|
|
implement their own memory handling schemes. The major problem with
|
|
reference counting is that it sometimes doesn't realise that objects
|
|
are no longer accessible, resulting in a memory leak. This happens
|
|
when there are cycles of references.
|
|
|
|
Consider the simplest possible cycle,
|
|
a class instance which has a reference to itself:
|
|
|
|
\begin{verbatim}
|
|
instance = SomeClass()
|
|
instance.myself = instance
|
|
\end{verbatim}
|
|
|
|
After the above two lines of code have been executed, the reference
|
|
count of \code{instance} is 2; one reference is from the variable
|
|
named \samp{'instance'}, and the other is from the \samp{myself}
|
|
attribute of the instance.
|
|
|
|
If the next line of code is \code{del instance}, what happens? The
|
|
reference count of \code{instance} is decreased by 1, so it has a
|
|
reference count of 1; the reference in the \samp{myself} attribute
|
|
still exists. Yet the instance is no longer accessible through Python
|
|
code, and it could be deleted. Several objects can participate in a
|
|
cycle if they have references to each other, causing all of the
|
|
objects to be leaked.
|
|
|
|
An experimental step has been made toward fixing this problem. When
|
|
compiling Python, the \verb|--with-cycle-gc| option can be specified.
|
|
This causes a cycle detection algorithm to be periodically executed,
|
|
which looks for inaccessible cycles and deletes the objects involved.
|
|
A new \module{gc} module provides functions to perform a garbage
|
|
collection, obtain debugging statistics, and tuning the collector's parameters.
|
|
|
|
Why isn't cycle detection enabled by default? Running the cycle detection
|
|
algorithm takes some time, and some tuning will be required to
|
|
minimize the overhead cost. It's not yet obvious how much performance
|
|
is lost, because benchmarking this is tricky and depends crucially
|
|
on how often the program creates and destroys objects.
|
|
|
|
Several people tackled this problem and contributed to a solution. An
|
|
early implementation of the cycle detection approach was written by
|
|
Toby Kelsey. The current algorithm was suggested by Eric Tiedemann
|
|
during a visit to CNRI, and Guido van Rossum and Neil Schemenauer
|
|
wrote two different implementations, which were later integrated by
|
|
Neil. Lots of other people offered suggestions along the way; the
|
|
March 2000 archives of the python-dev mailing list contain most of the
|
|
relevant discussion, especially in the threads titled ``Reference
|
|
cycle collection for Python'' and ``Finalization again''.
|
|
|
|
% ======================================================================
|
|
\section{Other Core Changes}
|
|
|
|
Various minor changes have been made to Python's syntax and built-in
|
|
functions. None of the changes are very far-reaching, but they're
|
|
handy conveniences.
|
|
|
|
\subsection{Minor Language Changes}
|
|
|
|
A new syntax makes it more convenient to call a given function
|
|
with a tuple of arguments and/or a dictionary of keyword arguments.
|
|
In Python 1.5 and earlier, you'd use the \function{apply()}
|
|
built-in function: \code{apply(f, \var{args}, \var{kw})} calls the
|
|
function \function{f()} with the argument tuple \var{args} and the
|
|
keyword arguments in the dictionary \var{kw}. \function{apply()}
|
|
is the same in 2.0, but thanks to a patch from
|
|
Greg Ewing, \code{f(*\var{args}, **\var{kw})} as a shorter
|
|
and clearer way to achieve the same effect. This syntax is
|
|
symmetrical with the syntax for defining functions:
|
|
|
|
\begin{verbatim}
|
|
def f(*args, **kw):
|
|
# args is a tuple of positional args,
|
|
# kw is a dictionary of keyword args
|
|
...
|
|
\end{verbatim}
|
|
|
|
The \keyword{print} statement can now have its output directed to a
|
|
file-like object by following the \keyword{print} with
|
|
\verb|>> file|, similar to the redirection operator in Unix shells.
|
|
Previously you'd either have to use the \method{write()} method of the
|
|
file-like object, which lacks the convenience and simplicity of
|
|
\keyword{print}, or you could assign a new value to
|
|
\code{sys.stdout} and then restore the old value. For sending output to standard error,
|
|
it's much easier to write this:
|
|
|
|
\begin{verbatim}
|
|
print >> sys.stderr, "Warning: action field not supplied"
|
|
\end{verbatim}
|
|
|
|
Modules can now be renamed on importing them, using the syntax
|
|
\code{import \var{module} as \var{name}} or \code{from \var{module}
|
|
import \var{name} as \var{othername}}. The patch was submitted by
|
|
Thomas Wouters.
|
|
|
|
A new format style is available when using the \code{\%} operator;
|
|
'\%r' will insert the \function{repr()} of its argument. This was
|
|
also added from symmetry considerations, this time for symmetry with
|
|
the existing '\%s' format style, which inserts the \function{str()} of
|
|
its argument. For example, \code{'\%r \%s' \% ('abc', 'abc')} returns a
|
|
string containing \verb|'abc' abc|.
|
|
|
|
Previously there was no way to implement a class that overrode
|
|
Python's built-in \keyword{in} operator and implemented a custom
|
|
version. \code{\var{obj} in \var{seq}} returns true if \var{obj} is
|
|
present in the sequence \var{seq}; Python computes this by simply
|
|
trying every index of the sequence until either \var{obj} is found or
|
|
an \exception{IndexError} is encountered. Moshe Zadka contributed a
|
|
patch which adds a \method{__contains__} magic method for providing a
|
|
custom implementation for \keyword{in}. Additionally, new built-in
|
|
objects written in C can define what \keyword{in} means for them via a
|
|
new slot in the sequence protocol.
|
|
|
|
Earlier versions of Python used a recursive algorithm for deleting
|
|
objects. Deeply nested data structures could cause the interpreter to
|
|
fill up the C stack and crash; Christian Tismer rewrote the deletion
|
|
logic to fix this problem. On a related note, comparing recursive
|
|
objects recursed infinitely and crashed; Jeremy Hylton rewrote the
|
|
code to no longer crash, producing a useful result instead. For
|
|
example, after this code:
|
|
|
|
\begin{verbatim}
|
|
a = []
|
|
b = []
|
|
a.append(a)
|
|
b.append(b)
|
|
\end{verbatim}
|
|
|
|
The comparison \code{a==b} returns true, because the two recursive
|
|
data structures are isomorphic. \footnote{See the thread ``trashcan
|
|
and PR\#7'' in the April 2000 archives of the python-dev mailing list
|
|
for the discussion leading up to this implementation, and some useful
|
|
relevant links.
|
|
%http://www.python.org/pipermail/python-dev/2000-April/004834.html
|
|
}
|
|
|
|
Work has been done on porting Python to 64-bit Windows on the Itanium
|
|
processor, mostly by Trent Mick of ActiveState. (Confusingly,
|
|
\code{sys.platform} is still \code{'win32'} on Win64 because it seems
|
|
that for ease of porting, MS Visual C++ treats code as 32 bit on Itanium.)
|
|
PythonWin also supports Windows CE; see the Python CE page at
|
|
\url{http://starship.python.net/crew/mhammond/ce/} for more
|
|
information.
|
|
|
|
An attempt has been made to alleviate one of Python's warts, the
|
|
often-confusing \exception{NameError} exception when code refers to a
|
|
local variable before the variable has been assigned a value. For
|
|
example, the following code raises an exception on the \keyword{print}
|
|
statement in both 1.5.2 and 2.0; in 1.5.2 a \exception{NameError}
|
|
exception is raised, while 2.0 raises a new
|
|
\exception{UnboundLocalError} exception.
|
|
\exception{UnboundLocalError} is a subclass of \exception{NameError},
|
|
so any existing code that expects \exception{NameError} to be raised
|
|
should still work.
|
|
|
|
\begin{verbatim}
|
|
def f():
|
|
print "i=",i
|
|
i = i + 1
|
|
f()
|
|
\end{verbatim}
|
|
|
|
\subsection{Changes to Built-in Functions}
|
|
|
|
A new built-in, \function{zip(\var{seq1}, \var{seq2}, ...)}, has been
|
|
added. \function{zip()} returns a list of tuples where each tuple
|
|
contains the i-th element from each of the argument sequences. The
|
|
difference between \function{zip()} and \code{map(None, \var{seq1},
|
|
\var{seq2})} is that \function{map()} pads the sequences with
|
|
\code{None} if the sequences aren't all of the same length, while
|
|
\function{zip()} truncates the returned list to the length of the
|
|
shortest argument sequence.
|
|
|
|
The \function{int()} and \function{long()} functions now accept an
|
|
optional ``base'' parameter when the first argument is a string.
|
|
\code{int('123', 10)} returns 123, while \code{int('123', 16)} returns
|
|
291. \code{int(123, 16)} raises a \exception{TypeError} exception
|
|
with the message ``can't convert non-string with explicit base''.
|
|
|
|
A new variable holding more detailed version information has been
|
|
added to the \module{sys} module. \code{sys.version_info} is a tuple
|
|
\code{(\var{major}, \var{minor}, \var{micro}, \var{level},
|
|
\var{serial})} For example, in a hypothetical 2.0.1beta1,
|
|
\code{sys.version_info} would be \code{(2, 0, 1, 'beta', 1)}.
|
|
\var{level} is a string such as \code{"alpha"}, \code{"beta"}, or
|
|
\code{"final"} for a final release.
|
|
|
|
Dictionaries have an odd new method, \method{setdefault(\var{key},
|
|
\var{default})}, which behaves similarly to the existing
|
|
\method{get()} method. However, if the key is missing,
|
|
\method{setdefault()} both returns the value of \var{default} as
|
|
\method{get()} would do, and also inserts it into the dictionary as
|
|
the value for \var{key}. Thus, the following lines of code:
|
|
|
|
\begin{verbatim}
|
|
if dict.has_key( key ): return dict[key]
|
|
else:
|
|
dict[key] = []
|
|
return dict[key]
|
|
\end{verbatim}
|
|
|
|
can be reduced to a single \code{return dict.setdefault(key, [])} statement.
|
|
|
|
|
|
% ======================================================================
|
|
\section{Porting to 2.0}
|
|
|
|
New Python releases try hard to be compatible with previous releases,
|
|
and the record has been pretty good. However, some changes are
|
|
considered useful enough, often fixing initial design decisions that
|
|
turned to be actively mistaken, that breaking backward compatibility
|
|
can't always be avoided. This section lists the changes in Python 2.0
|
|
that may cause old Python code to break.
|
|
|
|
The change which will probably break the most code is tightening up
|
|
the arguments accepted by some methods. Some methods would take
|
|
multiple arguments and treat them as a tuple, particularly various
|
|
list methods such as \method{.append()} and \method{.insert()}.
|
|
In earlier versions of Python, if \code{L} is a list, \code{L.append(
|
|
1,2 )} appends the tuple \code{(1,2)} to the list. In Python 2.0 this
|
|
causes a \exception{TypeError} exception to be raised, with the
|
|
message: 'append requires exactly 1 argument; 2 given'. The fix is to
|
|
simply add an extra set of parentheses to pass both values as a tuple:
|
|
\code{L.append( (1,2) )}.
|
|
|
|
The earlier versions of these methods were more forgiving because they
|
|
used an old function in Python's C interface to parse their arguments;
|
|
2.0 modernizes them to use \function{PyArg_ParseTuple}, the current
|
|
argument parsing function, which provides more helpful error messages
|
|
and treats multi-argument calls as errors. If you absolutely must use
|
|
2.0 but can't fix your code, you can edit \file{Objects/listobject.c}
|
|
and define the preprocessor symbol \code{NO_STRICT_LIST_APPEND} to
|
|
preserve the old behaviour; this isn't recommended.
|
|
|
|
Some of the functions in the \module{socket} module are still
|
|
forgiving in this way. For example, \function{socket.connect(
|
|
('hostname', 25) )} is the correct form, passing a tuple representing
|
|
an IP address, but \function{socket.connect( 'hostname', 25 )} also
|
|
works. \function{socket.connect_ex()} and \function{socket.bind()} are
|
|
similarly easy-going. 2.0alpha1 tightened these functions up, but
|
|
because the documentation actually used the erroneous multiple
|
|
argument form, many people wrote code which would break with the
|
|
stricter checking. GvR backed out the changes in the face of public
|
|
reaction, so for the \module{socket} module, the documentation was
|
|
fixed and the multiple argument form is simply marked as deprecated;
|
|
it \emph{will} be tightened up again in a future Python version.
|
|
|
|
Some work has been done to make integers and long integers a bit more
|
|
interchangeable. In 1.5.2, large-file support was added for Solaris,
|
|
to allow reading files larger than 2Gb; this made the \method{tell()}
|
|
method of file objects return a long integer instead of a regular
|
|
integer. Some code would subtract two file offsets and attempt to use
|
|
the result to multiply a sequence or slice a string, but this raised a
|
|
\exception{TypeError}. In 2.0, long integers can be used to multiply
|
|
or slice a sequence, and it'll behave as you'd intuitively expect it
|
|
to; \code{3L * 'abc'} produces 'abcabcabc', and \code{
|
|
(0,1,2,3)[2L:4L]} produces (2,3). Long integers can also be used in
|
|
various new places where previously only integers were accepted, such
|
|
as in the \method{seek()} method of file objects.
|
|
|
|
The subtlest long integer change of all is that the \function{str()}
|
|
of a long integer no longer has a trailing 'L' character, though
|
|
\function{repr()} still includes it. The 'L' annoyed many people who
|
|
wanted to print long integers that looked just like regular integers,
|
|
since they had to go out of their way to chop off the character. This
|
|
is no longer a problem in 2.0, but code which does \code{str(longval)[:-1]} and assumes the 'L' is there, will now lose
|
|
the final digit.
|
|
|
|
Taking the \function{repr()} of a float now uses a different
|
|
formatting precision than \function{str()}. \function{repr()} uses
|
|
\code{\%.17g} format string for C's \function{sprintf()}, while
|
|
\function{str()} uses \code{\%.12g} as before. The effect is that
|
|
\function{repr()} may occasionally show more decimal places than
|
|
\function{str()}, for certain numbers.
|
|
For example, the number 8.1 can't be represented exactly in binary, so
|
|
\code{repr(8.1)} is \code{'8.0999999999999996'}, while str(8.1) is
|
|
\code{'8.1'}.
|
|
|
|
The \code{-X} command-line option, which turned all standard
|
|
exceptions into strings instead of classes, has been removed; the
|
|
standard exceptions will now always be classes. The
|
|
\module{exceptions} module containing the standard exceptions was
|
|
translated from Python to a built-in C module, written by Barry Warsaw
|
|
and Fredrik Lundh.
|
|
|
|
% Commented out for now -- I don't think anyone will care.
|
|
%The pattern and match objects provided by SRE are C types, not Python
|
|
%class instances as in 1.5. This means you can no longer inherit from
|
|
%\class{RegexObject} or \class{MatchObject}, but that shouldn't be much
|
|
%of a problem since no one should have been doing that in the first
|
|
%place.
|
|
|
|
% ======================================================================
|
|
\section{Extending/Embedding Changes}
|
|
|
|
Some of the changes are under the covers, and will only be apparent to
|
|
people writing C extension modules or embedding a Python interpreter
|
|
in a larger application. If you aren't dealing with Python's C API,
|
|
you can safely skip this section.
|
|
|
|
The version number of the Python C API was incremented, so C
|
|
extensions compiled for 1.5.2 must be recompiled in order to work with
|
|
2.0. On Windows, attempting to import a third party extension built
|
|
for Python 1.5.x usually results in an immediate crash; there's not
|
|
much we can do about this. (Here's Mark Hammond's explanation of the
|
|
reasons for the crash. The 1.5 module is linked against
|
|
\file{Python15.dll}. When \file{Python.exe} , linked against
|
|
\file{Python16.dll}, starts up, it initializes the Python data
|
|
structures in \file{Python16.dll}. When Python then imports the
|
|
module \file{foo.pyd} linked against \file{Python15.dll}, it
|
|
immediately tries to call the functions in that DLL. As Python has
|
|
not been initialized in that DLL, the program immediately crashes.)
|
|
|
|
Users of Jim Fulton's ExtensionClass module will be pleased to find
|
|
out that hooks have been added so that ExtensionClasses are now
|
|
supported by \function{isinstance()} and \function{issubclass()}.
|
|
This means you no longer have to remember to write code such as
|
|
\code{if type(obj) == myExtensionClass}, but can use the more natural
|
|
\code{if isinstance(obj, myExtensionClass)}.
|
|
|
|
The \file{Python/importdl.c} file, which was a mass of \#ifdefs to
|
|
support dynamic loading on many different platforms, was cleaned up
|
|
and reorganised by Greg Stein. \file{importdl.c} is now quite small,
|
|
and platform-specific code has been moved into a bunch of
|
|
\file{Python/dynload_*.c} files. Another cleanup: there were also a
|
|
number of \file{my*.h} files in the Include/ directory that held
|
|
various portability hacks; they've been merged into a single file,
|
|
\file{Include/pyport.h}.
|
|
|
|
Vladimir Marangozov's long-awaited malloc restructuring was completed,
|
|
to make it easy to have the Python interpreter use a custom allocator
|
|
instead of C's standard \function{malloc()}. For documentation, read
|
|
the comments in \file{Include/pymem.h} and
|
|
\file{Include/objimpl.h}. For the lengthy discussions during which
|
|
the interface was hammered out, see the Web archives of the 'patches'
|
|
and 'python-dev' lists at python.org.
|
|
|
|
Recent versions of the GUSI development environment for MacOS support
|
|
POSIX threads. Therefore, Python's POSIX threading support now works
|
|
on the Macintosh. Threading support using the user-space GNU \texttt{pth}
|
|
library was also contributed.
|
|
|
|
Threading support on Windows was enhanced, too. Windows supports
|
|
thread locks that use kernel objects only in case of contention; in
|
|
the common case when there's no contention, they use simpler functions
|
|
which are an order of magnitude faster. A threaded version of Python
|
|
1.5.2 on NT is twice as slow as an unthreaded version; with the 2.0
|
|
changes, the difference is only 10\%. These improvements were
|
|
contributed by Yakov Markovitch.
|
|
|
|
Python 2.0's source now uses only ANSI C prototypes, so compiling Python now
|
|
requires an ANSI C compiler, and can no longer be done using a compiler that
|
|
only supports K\&R C.
|
|
|
|
% ======================================================================
|
|
\section{Distutils: Making Modules Easy to Install}
|
|
|
|
Before Python 2.0, installing modules was a tedious affair -- there
|
|
was no way to figure out automatically where Python is installed, or
|
|
what compiler options to use for extension modules. Software authors
|
|
had to go through an arduous ritual of editing Makefiles and
|
|
configuration files, which only really work on Unix and leave Windows
|
|
and MacOS unsupported. Software users faced wildly differing
|
|
installation instructions
|
|
|
|
The SIG for distribution utilities, shepherded by Greg Ward, has
|
|
created the Distutils, a system to make package installation much
|
|
easier. They form the \module{distutils} package, a new part of
|
|
Python's standard library. In the best case, installing a Python
|
|
module from source will require the same steps: first you simply mean
|
|
unpack the tarball or zip archive, and the run ``\code{python setup.py
|
|
install}''. The platform will be automatically detected, the compiler
|
|
will be recognized, C extension modules will be compiled, and the
|
|
distribution installed into the proper directory. Optional
|
|
command-line arguments provide more control over the installation
|
|
process, the distutils package offers many places to override defaults
|
|
-- separating the build from the install, building or installing in
|
|
non-default directories, and more.
|
|
|
|
In order to use the Distutils, you need to write a \file{setup.py}
|
|
script. For the simple case, when the software contains only .py
|
|
files, a minimal \file{setup.py} can be just a few lines long:
|
|
|
|
\begin{verbatim}
|
|
from distutils.core import setup
|
|
setup (name = "foo", version = "1.0",
|
|
py_modules = ["module1", "module2"])
|
|
\end{verbatim}
|
|
|
|
The \file{setup.py} file isn't much more complicated if the software
|
|
consists of a few packages:
|
|
|
|
\begin{verbatim}
|
|
from distutils.core import setup
|
|
setup (name = "foo", version = "1.0",
|
|
packages = ["package", "package.subpackage"])
|
|
\end{verbatim}
|
|
|
|
A C extension can be the most complicated case; here's an example taken from
|
|
the PyXML package:
|
|
|
|
|
|
\begin{verbatim}
|
|
from distutils.core import setup, Extension
|
|
|
|
expat_extension = Extension('xml.parsers.pyexpat',
|
|
define_macros = [('XML_NS', None)],
|
|
include_dirs = [ 'extensions/expat/xmltok',
|
|
'extensions/expat/xmlparse' ],
|
|
sources = [ 'extensions/pyexpat.c',
|
|
'extensions/expat/xmltok/xmltok.c',
|
|
'extensions/expat/xmltok/xmlrole.c',
|
|
]
|
|
)
|
|
setup (name = "PyXML", version = "0.5.4",
|
|
ext_modules =[ expat_extension ] )
|
|
|
|
\end{verbatim}
|
|
|
|
The Distutils can also take care of creating source and binary
|
|
distributions. The ``sdist'' command, run by ``\code{python setup.py
|
|
sdist}', builds a source distribution such as \file{foo-1.0.tar.gz}.
|
|
Adding new commands isn't difficult, ``bdist_rpm'' and
|
|
``bdist_wininst'' commands have already been contributed to create an
|
|
RPM distribution and a Windows installer for the software,
|
|
respectively. Commands to create other distribution formats such as
|
|
Debian packages and Solaris \file{.pkg} files are in various stages of
|
|
development.
|
|
|
|
All this is documented in a new manual, \textit{Distributing Python
|
|
Modules}, that joins the basic set of Python documentation.
|
|
|
|
% ======================================================================
|
|
%\section{New XML Code}
|
|
|
|
%XXX write this section...
|
|
|
|
% ======================================================================
|
|
\section{Module changes}
|
|
|
|
Lots of improvements and bugfixes were made to Python's extensive
|
|
standard library; some of the affected modules include
|
|
\module{readline}, \module{ConfigParser}, \module{cgi},
|
|
\module{calendar}, \module{posix}, \module{readline}, \module{xmllib},
|
|
\module{aifc}, \module{chunk, wave}, \module{random}, \module{shelve},
|
|
and \module{nntplib}. Consult the CVS logs for the exact
|
|
patch-by-patch details.
|
|
|
|
Brian Gallew contributed OpenSSL support for the \module{socket}
|
|
module. OpenSSL is an implementation of the Secure Socket Layer,
|
|
which encrypts the data being sent over a socket. When compiling
|
|
Python, you can edit \file{Modules/Setup} to include SSL support,
|
|
which adds an additional function to the \module{socket} module:
|
|
\function{socket.ssl(\var{socket}, \var{keyfile}, \var{certfile})},
|
|
which takes a socket object and returns an SSL socket. The
|
|
\module{httplib} and \module{urllib} modules were also changed to
|
|
support ``https://'' URLs, though no one has implemented FTP or SMTP
|
|
over SSL.
|
|
|
|
The \module{httplib} module has been rewritten by Greg Stein to
|
|
support HTTP/1.1. Backward compatibility with the 1.5 version of
|
|
\module{httplib} is provided, though using HTTP/1.1 features such as
|
|
pipelining will require rewriting code to use a different set of
|
|
interfaces.
|
|
|
|
The \module{Tkinter} module now supports Tcl/Tk version 8.1, 8.2, or
|
|
8.3, and support for the older 7.x versions has been dropped. The
|
|
Tkinter module now supports displaying Unicode strings in Tk widgets.
|
|
Also, Fredrik Lundh contributed an optimization which makes operations
|
|
like \code{create_line} and \code{create_polygon} much faster,
|
|
especially when using lots of coordinates.
|
|
|
|
The \module{curses} module has been greatly extended, starting from
|
|
Oliver Andrich's enhanced version, to provide many additional
|
|
functions from ncurses and SYSV curses, such as colour, alternative
|
|
character set support, pads, and mouse support. This means the module
|
|
is no longer compatible with operating systems that only have BSD
|
|
curses, but there don't seem to be any currently maintained OSes that
|
|
fall into this category.
|
|
|
|
As mentioned in the earlier discussion of 2.0's Unicode support, the
|
|
underlying implementation of the regular expressions provided by the
|
|
\module{re} module has been changed. SRE, a new regular expression
|
|
engine written by Fredrik Lundh and partially funded by Hewlett
|
|
Packard, supports matching against both 8-bit strings and Unicode
|
|
strings.
|
|
|
|
% ======================================================================
|
|
\section{New modules}
|
|
|
|
A number of new modules were added. We'll simply list them with brief
|
|
descriptions; consult the 2.0 documentation for the details of a
|
|
particular module.
|
|
|
|
\begin{itemize}
|
|
|
|
\item{\module{atexit}}:
|
|
For registering functions to be called before the Python interpreter exits.
|
|
Code that currently sets
|
|
\code{sys.exitfunc} directly should be changed to
|
|
use the \module{atexit} module instead, importing \module{atexit}
|
|
and calling \function{atexit.register()} with
|
|
the function to be called on exit.
|
|
(Contributed by Skip Montanaro.)
|
|
|
|
\item{\module{codecs}, \module{encodings}, \module{unicodedata}:} Added as part of the new Unicode support.
|
|
|
|
\item{\module{filecmp}:} Supersedes the old \module{cmp}, \module{cmpcache} and
|
|
\module{dircmp} modules, which have now become deprecated.
|
|
(Contributed by Gordon MacMillan and Moshe Zadka.)
|
|
|
|
\item{\module{linuxaudiodev}:} Support for the \file{/dev/audio}
|
|
device on Linux, a twin to the existing \module{sunaudiodev} module.
|
|
(Contributed by Peter Bosch.)
|
|
|
|
\item{\module{mmap}:} An interface to memory-mapped files on both
|
|
Windows and Unix. A file's contents can be mapped directly into
|
|
memory, at which point it behaves like a mutable string, so its
|
|
contents can be read and modified. They can even be passed to
|
|
functions that expect ordinary strings, such as the \module{re}
|
|
module. (Contributed by Sam Rushing, with some extensions by
|
|
A.M. Kuchling.)
|
|
|
|
\item{\module{pyexpat}:} An interface to the Expat XML parser.
|
|
(Contributed by Paul Prescod.)
|
|
|
|
\item{\module{robotparser}:} Parse a \file{robots.txt} file, which is
|
|
used for writing Web spiders that politely avoid certain areas of a
|
|
Web site. The parser accepts the contents of a \file{robots.txt} file,
|
|
builds a set of rules from it, and can then answer questions about
|
|
the fetchability of a given URL. (Contributed by Skip Montanaro.)
|
|
|
|
\item{\module{tabnanny}:} A module/script to
|
|
check Python source code for ambiguous indentation.
|
|
(Contributed by Tim Peters.)
|
|
|
|
\item{\module{UserString}:} A base class useful for deriving objects that behave like strings.
|
|
|
|
\item{\module{webbrowser}:} A module that provides a platform independent
|
|
way to launch a web browser on a specific URL. For each platform, various
|
|
browsers are tried in a specific order. The user can alter which browser
|
|
is launched by setting the \var{BROWSER} environment variable.
|
|
(Originally inspired by Eric S. Raymond's patch to \module{urllib}
|
|
which added similar functionality, but
|
|
the final module comes from code originally
|
|
implemented by Fred Drake as \file{Tools/idle/BrowserControl.py},
|
|
and adapted for the standard library by Fred.)
|
|
|
|
\item{\module{winreg} and \module{_winreg}:} An interface to the
|
|
Windows registry. \module{_winreg} is an adaptation of functions that
|
|
have been part of PythonWin since 1995, but has now been added to the core
|
|
distribution, and enhanced to support Unicode. \module{winreg} is an
|
|
object-oriented API on top of the \module{_winreg} module.
|
|
\module{_winreg} was written by Bill Tutt and Mark Hammond, and \module{winreg}
|
|
was designed by Thomas Heller and implemented by Paul Prescod.
|
|
|
|
\item{\module{zipfile}:} A module for reading and writing ZIP-format
|
|
archives. These are archives produced by \program{PKZIP} on
|
|
DOS/Windows or \program{zip} on Unix, not to be confused with
|
|
\program{gzip}-format files (which are supported by the \module{gzip}
|
|
module)
|
|
(Contributed by James C. Ahlstrom.)
|
|
|
|
\item{\module{imputil}:} A module that provides a simpler way for
|
|
writing customised import hooks, in comparison to the existing
|
|
\module{ihooks} module. (Implemented by Greg Stein, with much
|
|
discussion on python-dev along the way.)
|
|
|
|
\end{itemize}
|
|
|
|
% ======================================================================
|
|
\section{IDLE Improvements}
|
|
|
|
IDLE is the official Python cross-platform IDE, written using Tkinter.
|
|
Python 2.0 includes IDLE 0.6, which adds a number of new features and
|
|
improvements. A partial list:
|
|
|
|
\begin{itemize}
|
|
\item UI improvements and optimizations,
|
|
especially in the area of syntax highlighting and auto-indentation.
|
|
|
|
\item The class browser now shows more information, such as the top
|
|
level functions in a module.
|
|
|
|
\item Tab width is now a user settable option. When opening an existing Python
|
|
file, IDLE automatically detects the indentation conventions, and adapts.
|
|
|
|
\item There is now support for calling browsers on various platforms,
|
|
used to open the Python documentation in a browser.
|
|
|
|
\item IDLE now has a command line, which is largely similar to
|
|
the vanilla Python interpreter.
|
|
|
|
\item Call tips were added in many places.
|
|
|
|
\item IDLE can now be installed as a package.
|
|
|
|
\item In the editor window, there is now a line/column bar at the bottom.
|
|
|
|
\item Three new keystroke commands: Check module (Alt-F5), Import
|
|
module (F5) and Run script (Ctrl-F5).
|
|
|
|
\end{itemize}
|
|
|
|
% ======================================================================
|
|
\section{Deleted and Deprecated Modules}
|
|
|
|
A few modules have been dropped because they're obsolete, or because
|
|
there are now better ways to do the same thing. The \module{stdwin}
|
|
module is gone; it was for a platform-independent windowing toolkit
|
|
that's no longer developed.
|
|
|
|
A number of modules have been moved to the
|
|
\file{lib-old} subdirectory:
|
|
\module{cmp}, \module{cmpcache}, \module{dircmp}, \module{dump},
|
|
\module{find}, \module{grep}, \module{packmail},
|
|
\module{poly}, \module{util}, \module{whatsound}, \module{zmod}.
|
|
If you have code which relies on a module that's been moved to
|
|
\file{lib-old}, you can simply add that directory to \code{sys.path}
|
|
to get them back, but you're encouraged to update any code that uses
|
|
these modules.
|
|
|
|
\section{Acknowledgements}
|
|
|
|
The authors would like to thank the following people for offering
|
|
suggestions on drafts of this article: Mark Hammond, Fredrik Lundh,
|
|
Detlef Lannert, Skip Montanaro, Vladimir Marangozov, Guido van Rossum,
|
|
and Neil Schemenauer.
|
|
|
|
\end{document}
|