mirror of https://github.com/python/cpython
1563 lines
59 KiB
TeX
1563 lines
59 KiB
TeX
\documentclass{howto}
|
|
\usepackage{distutils}
|
|
% $Id$
|
|
|
|
% Don't write extensive text for new sections; I'll do that.
|
|
% Feel free to add commented-out reminders of things that need
|
|
% to be covered. --amk
|
|
|
|
% XXX pydoc can display links to module docs -- but when?
|
|
%
|
|
|
|
\title{What's New in Python 2.4}
|
|
\release{0.4}
|
|
\author{A.M.\ Kuchling}
|
|
\authoraddress{
|
|
\strong{Python Software Foundation}\\
|
|
Email: \email{amk@amk.ca}
|
|
}
|
|
|
|
\begin{document}
|
|
\maketitle
|
|
\tableofcontents
|
|
|
|
This article explains the new features in Python 2.4 beta1, scheduled
|
|
for release in mid-October. The final version of Python 2.4 is
|
|
expected to be released around December 2004.
|
|
|
|
Python 2.4 is a medium-sized release. It doesn't introduce as many
|
|
changes as the radical Python 2.2, but introduces more features than
|
|
the conservative 2.3 release did. The most significant new language
|
|
features (as of this writing) are function decorators and generator
|
|
expressions; most other changes are to the standard library.
|
|
% XXX update these figures as we go
|
|
According to the CVS change logs, there were 421 patches applied and
|
|
413 bugs fixed between Python 2.3 and 2.4. Both figures are likely to
|
|
be underestimates.
|
|
|
|
|
|
This article doesn't attempt to provide a complete specification of
|
|
every single new feature, but instead provides a convenient overview.
|
|
For full details, you should refer to the documentation for Python
|
|
2.4, such as the \citetitle[../lib/lib.html]{Python Library Reference}
|
|
and the \citetitle[../ref/ref.html]{Python Reference Manual}. If you
|
|
want to understand the complete implementation and design rationale,
|
|
refer to the PEP for a particular new feature or to the module
|
|
documentation.
|
|
|
|
|
|
%======================================================================
|
|
\section{PEP 218: Built-In Set Objects}
|
|
|
|
Python 2.3 introduced the \module{sets} module. C implementations of
|
|
set data types have now been added to the Python core as two new
|
|
built-in types, \function{set(\var{iterable})} and
|
|
\function{frozenset(\var{iterable})}. They provide high speed
|
|
operations for membership testing, for eliminating duplicates from
|
|
sequences, and for mathematical operations like unions, intersections,
|
|
differences, and symmetric differences.
|
|
|
|
\begin{verbatim}
|
|
>>> a = set('abracadabra') # form a set from a string
|
|
>>> 'z' in a # fast membership testing
|
|
False
|
|
>>> a # unique letters in a
|
|
set(['a', 'r', 'b', 'c', 'd'])
|
|
>>> ''.join(a) # convert back into a string
|
|
'arbcd'
|
|
|
|
>>> b = set('alacazam') # form a second set
|
|
>>> a - b # letters in a but not in b
|
|
set(['r', 'd', 'b'])
|
|
>>> a | b # letters in either a or b
|
|
set(['a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'])
|
|
>>> a & b # letters in both a and b
|
|
set(['a', 'c'])
|
|
>>> a ^ b # letters in a or b but not both
|
|
set(['r', 'd', 'b', 'm', 'z', 'l'])
|
|
|
|
>>> a.add('z') # add a new element
|
|
>>> a.update('wxy') # add multiple new elements
|
|
>>> a
|
|
set(['a', 'c', 'b', 'd', 'r', 'w', 'y', 'x', 'z'])
|
|
>>> a.remove('x') # take one element out
|
|
>>> a
|
|
set(['a', 'c', 'b', 'd', 'r', 'w', 'y', 'z'])
|
|
\end{verbatim}
|
|
|
|
The \function{frozenset} type is an immutable version of \function{set}.
|
|
Since it is immutable and hashable, it may be used as a dictionary key or
|
|
as a member of another set.
|
|
|
|
The \module{sets} module remains in the standard library, and may be
|
|
useful if you wish to subclass the \class{Set} or \class{ImmutableSet}
|
|
classes. There are currently no plans to deprecate the module.
|
|
|
|
\begin{seealso}
|
|
\seepep{218}{Adding a Built-In Set Object Type}{Originally proposed by
|
|
Greg Wilson and ultimately implemented by Raymond Hettinger.}
|
|
\end{seealso}
|
|
|
|
%======================================================================
|
|
\section{PEP 237: Unifying Long Integers and Integers}
|
|
|
|
The lengthy transition process for this PEP, begun in Python 2.2,
|
|
takes another step forward in Python 2.4. In 2.3, certain integer
|
|
operations that would behave differently after int/long unification
|
|
triggered \exception{FutureWarning} warnings and returned values
|
|
limited to 32 or 64 bits (depending on your platform). In 2.4, these
|
|
expressions no longer produce a warning and instead produce a
|
|
different result that's usually a long integer.
|
|
|
|
The problematic expressions are primarily left shifts and lengthy
|
|
hexadecimal and octal constants. For example,
|
|
\code{2 \textless{}\textless{} 32} results
|
|
in a warning in 2.3, evaluating to 0 on 32-bit platforms. In Python
|
|
2.4, this expression now returns the correct answer, 8589934592.
|
|
|
|
\begin{seealso}
|
|
\seepep{237}{Unifying Long Integers and Integers}{Original PEP
|
|
written by Moshe Zadka and GvR. The changes for 2.4 were implemented by
|
|
Kalle Svensson.}
|
|
\end{seealso}
|
|
|
|
%======================================================================
|
|
\section{PEP 289: Generator Expressions}
|
|
|
|
The iterator feature introduced in Python 2.2 and the
|
|
\module{itertools} module make it easier to write programs that loop
|
|
through large data sets without having the entire data set in memory
|
|
at one time. List comprehensions don't fit into this picture very
|
|
well because they produce a Python list object containing all of the
|
|
items, unavoidably pulling them all into memory. When trying to write
|
|
a functionally-styled program, it would be natural to write something
|
|
like:
|
|
|
|
\begin{verbatim}
|
|
links = [link for link in get_all_links() if not link.followed]
|
|
for link in links:
|
|
...
|
|
\end{verbatim}
|
|
|
|
instead of
|
|
|
|
\begin{verbatim}
|
|
for link in get_all_links():
|
|
if link.followed:
|
|
continue
|
|
...
|
|
\end{verbatim}
|
|
|
|
The first form is more concise and perhaps more readable, but if
|
|
you're dealing with a large number of link objects the second form
|
|
would have to be used to avoid having all link objects in memory at
|
|
the same time.
|
|
|
|
Generator expressions work similarly to list comprehensions but don't
|
|
materialize the entire list; instead they create a generator that will
|
|
return elements one by one. The above example could be written as:
|
|
|
|
\begin{verbatim}
|
|
links = (link for link in get_all_links() if not link.followed)
|
|
for link in links:
|
|
...
|
|
\end{verbatim}
|
|
|
|
Generator expressions always have to be written inside parentheses, as
|
|
in the above example. The parentheses signalling a function call also
|
|
count, so if you want to create a iterator that will be immediately
|
|
passed to a function you could write:
|
|
|
|
\begin{verbatim}
|
|
print sum(obj.count for obj in list_all_objects())
|
|
\end{verbatim}
|
|
|
|
Generator expressions differ from list comprehensions in various small
|
|
ways. Most notably, the loop variable (\var{obj} in the above
|
|
example) is not accessible outside of the generator expression. List
|
|
comprehensions leave the variable assigned to its last value; future
|
|
versions of Python will change this, making list comprehensions match
|
|
generator expressions in this respect.
|
|
|
|
\begin{seealso}
|
|
\seepep{289}{Generator Expressions}{Proposed by Raymond Hettinger and
|
|
implemented by Jiwon Seo with early efforts steered by Hye-Shik Chang.}
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{PEP 292: Simpler String Substitutions}
|
|
|
|
Some new classes in the standard library provide a
|
|
alternative mechanism for substituting variables into strings that's
|
|
better-suited for applications where untrained users need to edit templates.
|
|
|
|
The usual way of substituting variables by name is the \code{\%}
|
|
operator:
|
|
|
|
\begin{verbatim}
|
|
>>> '%(page)i: %(title)s' % {'page':2, 'title': 'The Best of Times'}
|
|
'2: The Best of Times'
|
|
\end{verbatim}
|
|
|
|
When writing the template string, it can be easy to forget the
|
|
\samp{i} or \samp{s} after the closing parenthesis. This isn't a big
|
|
problem if the template is in a Python module, because you run the
|
|
code, get an ``Unsupported format character'' \exception{ValueError},
|
|
and fix the problem. However, consider an application such as Mailman
|
|
where template strings or translations are being edited by users who
|
|
aren't aware of the Python language; the syntax is complicated to
|
|
explain to such users, and if they make a mistake, it's difficult to
|
|
provide helpful feedback to them.
|
|
|
|
PEP 292 adds a \class{Template} class to the \module{string} module
|
|
that uses \samp{\$} to indicate a substitution. \class{Template} is a
|
|
subclass of the built-in Unicode type, so the result is always a
|
|
Unicode string:
|
|
|
|
\begin{verbatim}
|
|
>>> import string
|
|
>>> t = string.Template('$page: $title')
|
|
>>> t.substitute({'page':2, 'title': 'The Best of Times'})
|
|
u'2: The Best of Times'
|
|
\end{verbatim}
|
|
|
|
% $ Terminate $-mode for Emacs
|
|
|
|
If a key is missing from the dictionary, the \method{substitute} method
|
|
will raise a \exception{KeyError}. There's also a \method{safe_substitute}
|
|
method that ignores missing keys:
|
|
|
|
\begin{verbatim}
|
|
>>> t = string.SafeTemplate('$page: $title')
|
|
>>> t.safe_substitute({'page':3})
|
|
u'3: $title'
|
|
\end{verbatim}
|
|
|
|
% $ Terminate math-mode for Emacs
|
|
|
|
|
|
\begin{seealso}
|
|
\seepep{292}{Simpler String Substitutions}{Written and implemented
|
|
by Barry Warsaw.}
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{PEP 318: Decorators for Functions, Methods and Classes}
|
|
|
|
Python 2.2 extended Python's object model by adding static methods and
|
|
class methods, but it didn't extend Python's syntax to provide any new
|
|
way of defining static or class methods. Instead, you had to write a
|
|
\keyword{def} statement in the usual way, and pass the resulting
|
|
method to a \function{staticmethod()} or \function{classmethod()}
|
|
function that would wrap up the function as a method of the new type.
|
|
Your code would look like this:
|
|
|
|
\begin{verbatim}
|
|
class C:
|
|
def meth (cls):
|
|
...
|
|
|
|
meth = classmethod(meth) # Rebind name to wrapped-up class method
|
|
\end{verbatim}
|
|
|
|
If the method was very long, it would be easy to miss or forget the
|
|
\function{classmethod()} invocation after the function body.
|
|
|
|
The intention was always to add some syntax to make such definitions
|
|
more readable, but at the time of 2.2's release a good syntax was not
|
|
obvious. Years later, when Python 2.4 is coming out, a good syntax
|
|
\emph{still} isn't obvious but users are asking for easier access to
|
|
the feature, so a new syntactic feature has been added.
|
|
|
|
The feature is called ``function decorators''. The name comes from
|
|
the idea that \function{classmethod}, \function{staticmethod}, and
|
|
friends are storing additional information on a function object; they're
|
|
\emph{decorating} functions with more details.
|
|
|
|
The notation borrows from Java and uses the \character{@} character as an
|
|
indicator. Using the new syntax, the example above would be written:
|
|
|
|
\begin{verbatim}
|
|
class C:
|
|
|
|
@classmethod
|
|
def meth (cls):
|
|
...
|
|
|
|
\end{verbatim}
|
|
|
|
The \code{@classmethod} is shorthand for the
|
|
\code{meth=classmethod(meth)} assignment. More generally, if you have
|
|
the following:
|
|
|
|
\begin{verbatim}
|
|
@A @B @C
|
|
def f ():
|
|
...
|
|
\end{verbatim}
|
|
|
|
It's equivalent to:
|
|
|
|
\begin{verbatim}
|
|
def f(): ...
|
|
f = A(B(C(f)))
|
|
\end{verbatim}
|
|
|
|
Decorators must come on the line before a function definition, and
|
|
can't be on the same line, meaning that \code{@A def f(): ...} is
|
|
illegal. You can only decorate function definitions, either at the
|
|
module-level or inside a class; you can't decorate class definitions.
|
|
|
|
A decorator is just a function that takes the function to be decorated
|
|
as an argument and returns either the same function or some new
|
|
callable thing. It's easy to write your own decorators. The
|
|
following simple example just sets an attribute on the function
|
|
object:
|
|
|
|
\begin{verbatim}
|
|
>>> def deco(func):
|
|
... func.attr = 'decorated'
|
|
... return func
|
|
...
|
|
>>> @deco
|
|
... def f(): pass
|
|
...
|
|
>>> f
|
|
<function f at 0x402ef0d4>
|
|
>>> f.attr
|
|
'decorated'
|
|
>>>
|
|
\end{verbatim}
|
|
|
|
As a slightly more realistic example, the following decorator checks
|
|
that the supplied argument is an integer:
|
|
|
|
\begin{verbatim}
|
|
def require_int (func):
|
|
def wrapper (arg):
|
|
assert isinstance(arg, int)
|
|
return func(arg)
|
|
|
|
return wrapper
|
|
|
|
@require_int
|
|
def p1 (arg):
|
|
print arg
|
|
|
|
@require_int
|
|
def p2(arg):
|
|
print arg*2
|
|
\end{verbatim}
|
|
|
|
An example in \pep{318} contains a fancier version of this idea that
|
|
lets you specify the required type and check the returned type as
|
|
well.
|
|
|
|
Decorator functions can take arguments. If arguments are supplied,
|
|
the decorator function is called with only those arguments and must
|
|
return a new decorator function; this new function must take a single
|
|
function and return a function, as previously described. In other
|
|
words, \code{@A @B @C(args)} becomes:
|
|
|
|
\begin{verbatim}
|
|
def f(): ...
|
|
_deco = C(args)
|
|
f = A(B(_deco(f)))
|
|
\end{verbatim}
|
|
|
|
Getting this right can be slightly brain-bending, but it's not too
|
|
difficult.
|
|
|
|
A small related change makes the \member{func_name} attribute of
|
|
functions writable. This attribute is used to display function names
|
|
in tracebacks, so decorators should change the name of any new
|
|
function that's constructed and returned.
|
|
|
|
The new syntax was provisionally added in 2.4alpha2, and is subject to
|
|
change during the 2.4beta release cycle depending on the Python
|
|
community's reaction. Post-2.4 versions of Python will preserve
|
|
compatibility with whatever syntax is used in 2.4final.
|
|
|
|
\begin{seealso}
|
|
\seepep{318}{Decorators for Functions, Methods and Classes}{Written
|
|
by Kevin D. Smith, Jim Jewett, and Skip Montanaro. Several people
|
|
wrote patches implementing function decorators, but the one that was
|
|
actually checked in was patch \#979728, written by Mark Russell.}
|
|
\end{seealso}
|
|
|
|
%======================================================================
|
|
\section{PEP 322: Reverse Iteration}
|
|
|
|
A new built-in function, \function{reversed(\var{seq})}, takes a sequence
|
|
and returns an iterator that loops over the elements of the sequence
|
|
in reverse order.
|
|
|
|
\begin{verbatim}
|
|
>>> for i in reversed(xrange(1,4)):
|
|
... print i
|
|
...
|
|
3
|
|
2
|
|
1
|
|
\end{verbatim}
|
|
|
|
Compared to extended slicing, such as \code{range(1,4)[::-1]},
|
|
\function{reversed()} is easier to read, runs faster, and uses
|
|
substantially less memory.
|
|
|
|
Note that \function{reversed()} only accepts sequences, not arbitrary
|
|
iterators. If you want to reverse an iterator, first convert it to
|
|
a list with \function{list()}.
|
|
|
|
\begin{verbatim}
|
|
>>> input= open('/etc/passwd', 'r')
|
|
>>> for line in reversed(list(input)):
|
|
... print line
|
|
...
|
|
root:*:0:0:System Administrator:/var/root:/bin/tcsh
|
|
...
|
|
\end{verbatim}
|
|
|
|
\begin{seealso}
|
|
\seepep{322}{Reverse Iteration}{Written and implemented by Raymond Hettinger.}
|
|
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{PEP 324: New subprocess Module}
|
|
|
|
The standard library provides a number of ways to
|
|
execute a subprocess, each of which offers different features and
|
|
levels of difficulty. \function{os.system(\var{command})} is easy, but
|
|
slow -- it runs a shell process which executes the command --
|
|
and dangerous -- you have to be careful about escaping metacharacters.
|
|
The \module{popen2} module offers classes that can capture
|
|
standard output and standard error from the subprocess, but the naming
|
|
is confusing.
|
|
|
|
The \module{subprocess} module cleans all this up, providing a unified
|
|
interface that offers all the features you might need.
|
|
Instead of \module{popen2}'s collection of classes,
|
|
\module{subprocess} contains a single class called \class{Popen}
|
|
whose constructor supports a number of different keyword arguments.
|
|
|
|
\begin{verbatim}
|
|
class Popen(args, bufsize=0, executable=None,
|
|
stdin=None, stdout=None, stderr=None,
|
|
preexec_fn=None, close_fds=False, shell=False,
|
|
cwd=None, env=None, universal_newlines=False,
|
|
startupinfo=None, creationflags=0):
|
|
\end{verbatim}
|
|
|
|
\var{args} is commonly a sequence of strings that will be the arguments to
|
|
the program executed as the subprocess. (If the \var{shell} argument is true,
|
|
\var{args} can be a string which will then be passed on to the shell for interpretation.)
|
|
|
|
\var{stdin}, \var{stdout}, and \var{stderr} specify what the
|
|
subprocess's input, output, and error streams will be. You can
|
|
provide a file object or a file descriptor, or you can
|
|
use \code{subprocess.PIPE} to create a pipe between the subprocess
|
|
and the parent.
|
|
|
|
The constructor has a number of handy options:
|
|
|
|
\begin{itemize}
|
|
\item \var{close_fds} requests that all file descriptors be closed before running the subprocess.
|
|
\item \var{cwd} specifies the working directory in which the subprocess will be executed (defaulting to whatever the parent's working directory is).
|
|
\item \var{env} is a dictionary specifying environment variables.
|
|
\item \var{preexec_fn} is a function that gets called before the child is started.
|
|
\item \var{universal_newlines} opens the child's input and output using
|
|
Python's universal newline feature.
|
|
\end{itemize}
|
|
|
|
Once you've created the \class{Popen} instance,
|
|
you can call \method{wait()} to pause until the subprocess has exited,
|
|
\method{poll()} to check if it's exited without pausing,
|
|
or \method{communicate(\var{data})} to send the string \var{data} to
|
|
the subprocess's standard input. \method{communicate(\var{data})}
|
|
then reads any data that the subprocess has sent to its standard output or error, returning a tuple \code{(\var{stdout_data}, \var{stderr_data})}.
|
|
|
|
\function{call()} is a shortcut that passes its arguments along to
|
|
the \class{Popen} constructor, waits for the command to complete, and
|
|
returns the status code of the subprocess. It can serve as an analog
|
|
to
|
|
\function{os.system()}:
|
|
|
|
\begin{verbatim}
|
|
sts = subprocess.call(['dpkg', '-i', '/tmp/new-package.deb'])
|
|
if sts == 0:
|
|
# Success
|
|
...
|
|
else:
|
|
# dpkg returned an error
|
|
...
|
|
\end{verbatim}
|
|
|
|
The command is invoked without use of the shell. If you really do want to
|
|
use the shell, you can add \code{shell=True} as a keyword argument and provide
|
|
a string instead of a sequence:
|
|
|
|
\begin{verbatim}
|
|
sts = subprocess.call('dpkg -i /tmp/new-package.deb', shell=True)
|
|
\end{verbatim}
|
|
|
|
The PEP takes various examples of shell and Python code and shows how
|
|
they'd be translated into Python code that uses \module{subprocess}.
|
|
Reading this section of the PEP is highly recommended.
|
|
|
|
\begin{seealso}
|
|
\seepep{324}{subprocess - New process module}{Written and implemented by Peter {\AA}strand, with assistance from Fredrik Lundh and others.}
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{PEP 327: Decimal Data Type}
|
|
|
|
Python has always supported floating-point (FP) numbers as a data
|
|
type, based on the underlying C \ctype{double} type. However, while
|
|
most programming languages provide a floating-point type, most people
|
|
(even programmers) are unaware that computing with floating-point
|
|
numbers entails certain unavoidable inaccuracies. The new decimal
|
|
type provides a way to avoid these inaccuracies.
|
|
|
|
\subsection{Why is Decimal needed?}
|
|
|
|
The limitations arise from the representation used for floating-point numbers.
|
|
FP numbers are made up of three components:
|
|
|
|
\begin{itemize}
|
|
\item The sign, which is positive or negative.
|
|
\item The mantissa, which is a single-digit binary number
|
|
followed by a fractional part. For example, \code{1.01} in base-2 notation
|
|
is \code{1 + 0/2 + 1/4}, or 1.25 in decimal notation.
|
|
\item The exponent, which tells where the decimal point is located in the number represented.
|
|
\end{itemize}
|
|
|
|
For example, the number 1.25 has positive sign, a mantissa value of
|
|
1.01 (in binary), and an exponent of 0 (the decimal point doesn't need
|
|
to be shifted). The number 5 has the same sign and mantissa, but the
|
|
exponent is 2 because the mantissa is multiplied by 4 (2 to the power
|
|
of the exponent 2).
|
|
|
|
Modern systems usually provide floating-point support that conforms to
|
|
a relevant standard called IEEE 754. C's \ctype{double} type is
|
|
usually implemented as a 64-bit IEEE 754 number, which uses 52 bits of
|
|
space for the mantissa. This means that numbers can only be specified
|
|
to 52 bits of precision. If you're trying to represent numbers whose
|
|
expansion repeats endlessly, the expansion is cut off after 52 bits.
|
|
Unfortunately, most software needs to produce output in base 10, and
|
|
base 10 often gives rise to such repeating decimals in the binary
|
|
expansion. For example, 1.1 decimal is binary \code{1.0001100110011
|
|
...}; .1 = 1/16 + 1/32 + 1/256 plus an infinite number of additional
|
|
terms. IEEE 754 has to chop off that infinitely repeated decimal
|
|
after 52 digits, so the representation is slightly inaccurate.
|
|
|
|
Sometimes you can see this inaccuracy when the number is printed:
|
|
\begin{verbatim}
|
|
>>> 1.1
|
|
1.1000000000000001
|
|
\end{verbatim}
|
|
|
|
The inaccuracy isn't always visible when you print the number because
|
|
the FP-to-decimal-string conversion is provided by the C library, and
|
|
most C libraries try to produce sensible output. Even if it's not
|
|
displayed, however, the inaccuracy is still there and subsequent
|
|
operations can magnify the error.
|
|
|
|
For many applications this doesn't matter. If I'm plotting points and
|
|
displaying them on my monitor, the difference between 1.1 and
|
|
1.1000000000000001 is too small to be visible. Reports often limit
|
|
output to a certain number of decimal places, and if you round the
|
|
number to two or three or even eight decimal places, the error is
|
|
never apparent. However, for applications where it does matter,
|
|
it's a lot of work to implement your own custom arithmetic routines.
|
|
|
|
Hence, the \class{Decimal} type was created.
|
|
|
|
\subsection{The \class{Decimal} type}
|
|
|
|
A new module, \module{decimal}, was added to Python's standard library.
|
|
It contains two classes, \class{Decimal} and \class{Context}.
|
|
\class{Decimal} instances represent numbers, and
|
|
\class{Context} instances are used to wrap up various settings such as the precision and default rounding mode.
|
|
|
|
\class{Decimal} instances, like regular Python integers and FP
|
|
numbers, are immutable; once they've been created, you can't change
|
|
the value it represents. \class{Decimal} instances can be created
|
|
from integers or strings:
|
|
|
|
\begin{verbatim}
|
|
>>> import decimal
|
|
>>> decimal.Decimal(1972)
|
|
Decimal("1972")
|
|
>>> decimal.Decimal("1.1")
|
|
Decimal("1.1")
|
|
\end{verbatim}
|
|
|
|
You can also provide tuples containing the sign, the mantissa represented
|
|
as a tuple of decimal digits, and the exponent:
|
|
|
|
\begin{verbatim}
|
|
>>> decimal.Decimal((1, (1, 4, 7, 5), -2))
|
|
Decimal("-14.75")
|
|
\end{verbatim}
|
|
|
|
Cautionary note: the sign bit is a Boolean value, so 0 is positive and
|
|
1 is negative.
|
|
|
|
Converting from floating-point numbers poses a bit of a problem:
|
|
should the FP number representing 1.1 turn into the decimal number for
|
|
exactly 1.1, or for 1.1 plus whatever inaccuracies are introduced?
|
|
The decision was to leave such a conversion out of the API. Instead,
|
|
you should convert the floating-point number into a string using the
|
|
desired precision and pass the string to the \class{Decimal}
|
|
constructor:
|
|
|
|
\begin{verbatim}
|
|
>>> f = 1.1
|
|
>>> decimal.Decimal(str(f))
|
|
Decimal("1.1")
|
|
>>> decimal.Decimal('%.12f' % f)
|
|
Decimal("1.100000000000")
|
|
\end{verbatim}
|
|
|
|
Once you have \class{Decimal} instances, you can perform the usual
|
|
mathematical operations on them. One limitation: exponentiation
|
|
requires an integer exponent:
|
|
|
|
\begin{verbatim}
|
|
>>> a = decimal.Decimal('35.72')
|
|
>>> b = decimal.Decimal('1.73')
|
|
>>> a+b
|
|
Decimal("37.45")
|
|
>>> a-b
|
|
Decimal("33.99")
|
|
>>> a*b
|
|
Decimal("61.7956")
|
|
>>> a/b
|
|
Decimal("20.64739884393063583815028902")
|
|
>>> a ** 2
|
|
Decimal("1275.9184")
|
|
>>> a**b
|
|
Traceback (most recent call last):
|
|
...
|
|
decimal.InvalidOperation: x ** (non-integer)
|
|
\end{verbatim}
|
|
|
|
You can combine \class{Decimal} instances with integers, but not with
|
|
floating-point numbers:
|
|
|
|
\begin{verbatim}
|
|
>>> a + 4
|
|
Decimal("39.72")
|
|
>>> a + 4.5
|
|
Traceback (most recent call last):
|
|
...
|
|
TypeError: You can interact Decimal only with int, long or Decimal data types.
|
|
>>>
|
|
\end{verbatim}
|
|
|
|
\class{Decimal} numbers can be used with the \module{math} and
|
|
\module{cmath} modules, but note that they'll be immediately converted to
|
|
floating-point numbers before the operation is performed, resulting in
|
|
a possible loss of precision and accuracy. You'll also get back a
|
|
regular floating-point number and not a \class{Decimal}.
|
|
|
|
\begin{verbatim}
|
|
>>> import math, cmath
|
|
>>> d = decimal.Decimal('123456789012.345')
|
|
>>> math.sqrt(d)
|
|
351364.18288201344
|
|
>>> cmath.sqrt(-d)
|
|
351364.18288201344j
|
|
\end{verbatim}
|
|
|
|
Instances also have a \method{sqrt()} method that returns a
|
|
\class{Decimal}, but if you need other things such as trigonometric
|
|
functions you'll have to implement them.
|
|
|
|
\begin{verbatim}
|
|
>>> d.sqrt()
|
|
Decimal("351364.1828820134592177245001")
|
|
\end{verbatim}
|
|
|
|
|
|
\subsection{The \class{Context} type}
|
|
|
|
Instances of the \class{Context} class encapsulate several settings for
|
|
decimal operations:
|
|
|
|
\begin{itemize}
|
|
\item \member{prec} is the precision, the number of decimal places.
|
|
\item \member{rounding} specifies the rounding mode. The \module{decimal}
|
|
module has constants for the various possibilities:
|
|
\constant{ROUND_DOWN}, \constant{ROUND_CEILING}, \constant{ROUND_HALF_EVEN}, and various others.
|
|
\item \member{traps} is a dictionary specifying what happens on
|
|
encountering certain error conditions: either an exception is raised or
|
|
a value is returned. Some examples of error conditions are
|
|
division by zero, loss of precision, and overflow.
|
|
\end{itemize}
|
|
|
|
There's a thread-local default context available by calling
|
|
\function{getcontext()}; you can change the properties of this context
|
|
to alter the default precision, rounding, or trap handling.
|
|
|
|
\begin{verbatim}
|
|
>>> decimal.getcontext().prec
|
|
28
|
|
>>> decimal.Decimal(1) / decimal.Decimal(7)
|
|
Decimal("0.1428571428571428571428571429")
|
|
>>> decimal.getcontext().prec = 9
|
|
>>> decimal.Decimal(1) / decimal.Decimal(7)
|
|
Decimal("0.142857143")
|
|
\end{verbatim}
|
|
|
|
The default action for error conditions is selectable; the module can
|
|
either return a special value such as infinity or not-a-number, or
|
|
exceptions can be raised:
|
|
|
|
\begin{verbatim}
|
|
>>> decimal.Decimal(1) / decimal.Decimal(0)
|
|
Traceback (most recent call last):
|
|
...
|
|
decimal.DivisionByZero: x / 0
|
|
>>> decimal.getcontext().traps[decimal.DivisionByZero] = False
|
|
>>> decimal.Decimal(1) / decimal.Decimal(0)
|
|
Decimal("Infinity")
|
|
>>>
|
|
\end{verbatim}
|
|
|
|
The \class{Context} instance also has various methods for formatting
|
|
numbers such as \method{to_eng_string()} and \method{to_sci_string()}.
|
|
|
|
For more information, see the documentation for the \module{decimal}
|
|
module, which includes a quick-start tutorial and a reference.
|
|
|
|
\begin{seealso}
|
|
\seepep{327}{Decimal Data Type}{Written by Facundo Batista and implemented
|
|
by Facundo Batista, Eric Price, Raymond Hettinger, Aahz, and Tim Peters.}
|
|
|
|
\seeurl{http://research.microsoft.com/\textasciitilde hollasch/cgindex/coding/ieeefloat.html}
|
|
{A more detailed overview of the IEEE-754 representation.}
|
|
|
|
\seeurl{http://www.lahey.com/float.htm}
|
|
{The article uses Fortran code to illustrate many of the problems
|
|
that floating-point inaccuracy can cause.}
|
|
|
|
\seeurl{http://www2.hursley.ibm.com/decimal/}
|
|
{A description of a decimal-based representation. This representation
|
|
is being proposed as a standard, and underlies the new Python decimal
|
|
type. Much of this material was written by Mike Cowlishaw, designer of the
|
|
Rexx language.}
|
|
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{PEP 328: Multi-line Imports}
|
|
|
|
One language change is a small syntactic tweak aimed at making it
|
|
easier to import many names from a module. In a
|
|
\code{from \var{module} import \var{names}} statement,
|
|
\var{names} is a sequence of names separated by commas. If the sequence is
|
|
very long, you can either write multiple imports from the same module,
|
|
or you can use backslashes to escape the line endings:
|
|
|
|
\begin{verbatim}
|
|
from SimpleXMLRPCServer import SimpleXMLRPCServer,\
|
|
SimpleXMLRPCRequestHandler,\
|
|
CGIXMLRPCRequestHandler,\
|
|
resolve_dotted_attribute
|
|
\end{verbatim}
|
|
|
|
The syntactic change simply allows putting the names within
|
|
parentheses. Python ignores newlines within a parenthesized
|
|
expression, so the backslashes are no longer needed:
|
|
|
|
\begin{verbatim}
|
|
from SimpleXMLRPCServer import (SimpleXMLRPCServer,
|
|
SimpleXMLRPCRequestHandler,
|
|
CGIXMLRPCRequestHandler,
|
|
resolve_dotted_attribute)
|
|
\end{verbatim}
|
|
|
|
The PEP also proposes that all \keyword{import} statements be
|
|
absolute imports, with a leading \samp{.} character to indicate a
|
|
relative import. This part of the PEP is not yet implemented.
|
|
|
|
\begin{seealso}
|
|
\seepep{328}{Imports: Multi-Line and Absolute/Relative}
|
|
{Written by Aahz. Multi-line imports were implemented by
|
|
Dima Dorfman.}
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{PEP 331: Locale-Independent Float/String Conversions}
|
|
|
|
The \module{locale} modules lets Python software select various
|
|
conversions and display conventions that are localized to a particular
|
|
country or language. However, the module was careful to not change
|
|
the numeric locale because various functions in Python's
|
|
implementation required that the numeric locale remain set to the
|
|
\code{'C'} locale. Often this was because the code was using the C library's
|
|
\cfunction{atof()} function.
|
|
|
|
Not setting the numeric locale caused trouble for extensions that used
|
|
third-party C libraries, however, because they wouldn't have the
|
|
correct locale set. The motivating example was GTK+, whose user
|
|
interface widgets weren't displaying numbers in the current locale.
|
|
|
|
The solution described in the PEP is to add three new functions to the
|
|
Python API that perform ASCII-only conversions, ignoring the locale
|
|
setting:
|
|
|
|
\begin{itemize}
|
|
\item \cfunction{PyOS_ascii_strtod(\var{str}, \var{ptr})}
|
|
and \cfunction{PyOS_ascii_atof(\var{str}, \var{ptr})}
|
|
both convert a string to a C \ctype{double}.
|
|
\item \cfunction{PyOS_ascii_formatd(\var{buffer}, \var{buf_len}, \var{format}, \var{d})} converts a \ctype{double} to an ASCII string.
|
|
\end{itemize}
|
|
|
|
The code for these functions came from the GLib library
|
|
(\url{http://developer.gnome.org/arch/gtk/glib.html}), whose
|
|
developers kindly relicensed the relevant functions and donated them
|
|
to the Python Software Foundation. The \module{locale} module
|
|
can now change the numeric locale, letting extensions such as GTK+
|
|
produce the correct results.
|
|
|
|
\begin{seealso}
|
|
\seepep{331}{Locale-Independent Float/String Conversions}{Written by Christian R. Reis, and implemented by Gustavo Carneiro.}
|
|
\end{seealso}
|
|
|
|
%======================================================================
|
|
\section{Other Language Changes}
|
|
|
|
Here are all of the changes that Python 2.4 makes to the core Python
|
|
language.
|
|
|
|
\begin{itemize}
|
|
|
|
\item The \method{dict.update()} method now accepts the same
|
|
argument forms as the \class{dict} constructor. This includes any
|
|
mapping, any iterable of key/value pairs, and keyword arguments.
|
|
(Contributed by Raymond Hettinger.)
|
|
|
|
\item The string methods \method{ljust()}, \method{rjust()}, and
|
|
\method{center()} now take an optional argument for specifying a
|
|
fill character other than a space.
|
|
(Contributed by Raymond Hettinger.)
|
|
|
|
\item Strings also gained an \method{rsplit()} method that
|
|
works like the \method{split()} method but splits from the end of
|
|
the string.
|
|
|
|
\begin{verbatim}
|
|
>>> 'www.python.org'.split('.', 1)
|
|
['www', 'python.org']
|
|
'www.python.org'.rsplit('.', 1)
|
|
['www.python', 'org']
|
|
\end{verbatim}
|
|
|
|
\item The \method{sort()} method of lists gained three keyword
|
|
arguments: \var{cmp}, \var{key}, and \var{reverse}. These arguments
|
|
make some common usages of \method{sort()} simpler. All are optional.
|
|
(Contributed by Raymond Hettinger.)
|
|
|
|
\var{cmp} is the same as the previous single argument to
|
|
\method{sort()}; if provided, the value should be a comparison
|
|
function that takes two arguments and returns -1, 0, or +1 depending
|
|
on how the arguments compare.
|
|
|
|
\var{key} should be a single-argument function that takes a list
|
|
element and returns a comparison key for the element. The list is
|
|
then sorted using the comparison keys. The following example sorts a
|
|
list case-insensitively:
|
|
|
|
\begin{verbatim}
|
|
>>> L = ['A', 'b', 'c', 'D']
|
|
>>> L.sort() # Case-sensitive sort
|
|
>>> L
|
|
['A', 'D', 'b', 'c']
|
|
>>> L.sort(key=lambda x: x.lower())
|
|
>>> L
|
|
['A', 'b', 'c', 'D']
|
|
>>> L.sort(cmp=lambda x,y: cmp(x.lower(), y.lower()))
|
|
>>> L
|
|
['A', 'b', 'c', 'D']
|
|
\end{verbatim}
|
|
|
|
The last example, which uses the \var{cmp} parameter, is the old way
|
|
to perform a case-insensitive sort. It works but is slower than
|
|
using a \var{key} parameter. Using \var{key} results in calling the
|
|
\method{lower()} method once for each element in the list while using
|
|
\var{cmp} will call it twice for each comparison.
|
|
|
|
For simple key functions and comparison functions, it is often
|
|
possible to avoid a \keyword{lambda} expression by using an unbound
|
|
method instead. For example, the above case-insensitive sort is best
|
|
coded as:
|
|
|
|
\begin{verbatim}
|
|
>>> L.sort(key=str.lower)
|
|
>>> L
|
|
['A', 'b', 'c', 'D']
|
|
\end{verbatim}
|
|
|
|
The \var{reverse} parameter should have a Boolean value. If the value
|
|
is \constant{True}, the list will be sorted into reverse order.
|
|
Instead of \code{L.sort(lambda x,y: cmp(x.score, y.score)) ;
|
|
L.reverse()}, you can now write: \code{L.sort(key = lambda x: x.score,
|
|
reverse=True)}.
|
|
|
|
The results of sorting are now guaranteed to be stable. This means
|
|
that two entries with equal keys will be returned in the same order as
|
|
they were input. For example, you can sort a list of people by name,
|
|
and then sort the list by age, resulting in a list sorted by age where
|
|
people with the same age are in name-sorted order.
|
|
|
|
\item There is a new built-in function
|
|
\function{sorted(\var{iterable})} that works like the in-place
|
|
\method{list.sort()} method but can be used in
|
|
expressions. The differences are:
|
|
\begin{itemize}
|
|
\item the input may be any iterable;
|
|
\item a newly formed copy is sorted, leaving the original intact; and
|
|
\item the expression returns the new sorted copy
|
|
\end{itemize}
|
|
|
|
\begin{verbatim}
|
|
>>> L = [9,7,8,3,2,4,1,6,5]
|
|
>>> [10+i for i in sorted(L)] # usable in a list comprehension
|
|
[11, 12, 13, 14, 15, 16, 17, 18, 19]
|
|
>>> L # original is left unchanged
|
|
[9,7,8,3,2,4,1,6,5]
|
|
>>> sorted('Monty Python') # any iterable may be an input
|
|
[' ', 'M', 'P', 'h', 'n', 'n', 'o', 'o', 't', 't', 'y', 'y']
|
|
|
|
>>> # List the contents of a dict sorted by key values
|
|
>>> colormap = dict(red=1, blue=2, green=3, black=4, yellow=5)
|
|
>>> for k, v in sorted(colormap.iteritems()):
|
|
... print k, v
|
|
...
|
|
black 4
|
|
blue 2
|
|
green 3
|
|
red 1
|
|
yellow 5
|
|
\end{verbatim}
|
|
|
|
(Contributed by Raymond Hettinger.)
|
|
|
|
\item Integer operations will no longer trigger an \exception{OverflowWarning}.
|
|
The \exception{OverflowWarning} warning will disappear in Python 2.5.
|
|
|
|
\item The interpreter gained a new switch, \programopt{-m}, that
|
|
takes a name, searches for the corresponding module on \code{sys.path},
|
|
and runs the module as a script. For example,
|
|
you can now run the Python profiler with \code{python -m profile}.
|
|
(Contributed by Nick Coghlan.)
|
|
|
|
\item The \function{eval(\var{expr}, \var{globals}, \var{locals})}
|
|
and \function{execfile(\var{filename}, \var{globals}, \var{locals})}
|
|
functions and the \keyword{exec} statement now accept any mapping type
|
|
for the \var{locals} argument. Previously this had to be a regular
|
|
Python dictionary. (Contributed by Raymond Hettinger.)
|
|
|
|
\item The \function{zip()} built-in function and \function{itertools.izip()}
|
|
now return an empty list if called with no arguments.
|
|
Previously they raised a \exception{TypeError}
|
|
exception. This makes them more
|
|
suitable for use with variable length argument lists:
|
|
|
|
\begin{verbatim}
|
|
>>> def transpose(array):
|
|
... return zip(*array)
|
|
...
|
|
>>> transpose([(1,2,3), (4,5,6)])
|
|
[(1, 4), (2, 5), (3, 6)]
|
|
>>> transpose([])
|
|
[]
|
|
\end{verbatim}
|
|
(Contributed by Raymond Hettinger.)
|
|
|
|
\item Encountering a failure while importing a module no longer leaves
|
|
a partially-initialized module object in \code{sys.modules}. The
|
|
incomplete module object left behind would fool further imports of the
|
|
same module into succeeding, leading to confusing errors.
|
|
|
|
\item \constant{None} is now a constant; code that binds a new value to
|
|
the name \samp{None} is now a syntax error.
|
|
(Contributed by Raymond Hettinger.)
|
|
|
|
\end{itemize}
|
|
|
|
|
|
%======================================================================
|
|
\subsection{Optimizations}
|
|
|
|
\begin{itemize}
|
|
|
|
\item The inner loops for list and tuple slicing
|
|
were optimized and now run about one-third faster. The inner loops
|
|
were also optimized for dictionaries, resulting in performance boosts for
|
|
\method{keys()}, \method{values()}, \method{items()},
|
|
\method{iterkeys()}, \method{itervalues()}, and \method{iteritems()}.
|
|
(Contributed by Raymond Hettinger.)
|
|
|
|
\item The machinery for growing and shrinking lists was optimized for
|
|
speed and for space efficiency. Appending and popping from lists now
|
|
runs faster due to more efficient code paths and less frequent use of
|
|
the underlying system \cfunction{realloc()}. List comprehensions
|
|
also benefit. \method{list.extend()} was also optimized and no
|
|
longer converts its argument into a temporary list before extending
|
|
the base list. (Contributed by Raymond Hettinger.)
|
|
|
|
\item \function{list()}, \function{tuple()}, \function{map()},
|
|
\function{filter()}, and \function{zip()} now run several times
|
|
faster with non-sequence arguments that supply a \method{__len__()}
|
|
method. (Contributed by Raymond Hettinger.)
|
|
|
|
\item The methods \method{list.__getitem__()},
|
|
\method{dict.__getitem__()}, and \method{dict.__contains__()} are
|
|
are now implemented as \class{method_descriptor} objects rather
|
|
than \class{wrapper_descriptor} objects. This form of optimized
|
|
access doubles their performance and makes them more suitable for
|
|
use as arguments to functionals:
|
|
\samp{map(mydict.__getitem__, keylist)}.
|
|
(Contributed by Raymond Hettinger.)
|
|
|
|
\item Added a new opcode, \code{LIST_APPEND}, that simplifies
|
|
the generated bytecode for list comprehensions and speeds them up
|
|
by about a third. (Contributed by Raymond Hettinger.)
|
|
|
|
\item The peephole bytecode optimizer has been improved to
|
|
produce shorter, faster bytecode; remarkably the resulting bytecode is
|
|
more readable. (Enhanced by Raymond Hettinger.)
|
|
|
|
\item String concatenations in statements of the form \code{s = s +
|
|
"abc"} and \code{s += "abc"} are now performed more efficiently in
|
|
certain circumstances. This optimization won't be present in other
|
|
Python implementations such as Jython, so you shouldn't rely on it;
|
|
using the \method{join()} method of strings is still recommended when
|
|
you want to efficiently glue a large number of strings together.
|
|
(Contributed by Armin Rigo.)
|
|
|
|
\end{itemize}
|
|
|
|
The net result of the 2.4 optimizations is that Python 2.4 runs the
|
|
pystone benchmark around XX\% faster than Python 2.3 and YY\% faster
|
|
than Python 2.2.
|
|
|
|
|
|
%======================================================================
|
|
\section{New, Improved, and Deprecated Modules}
|
|
|
|
As usual, Python's standard library received a number of enhancements and
|
|
bug fixes. Here's a partial list of the most notable changes, sorted
|
|
alphabetically by module name. Consult the
|
|
\file{Misc/NEWS} file in the source tree for a more
|
|
complete list of changes, or look through the CVS logs for all the
|
|
details.
|
|
|
|
\begin{itemize}
|
|
|
|
\item The \module{asyncore} module's \function{loop()} now has a
|
|
\var{count} parameter that lets you perform a limited number
|
|
of passes through the polling loop. The default is still to loop
|
|
forever.
|
|
|
|
\item The \module{base64} module now has more complete RFC 3548 support
|
|
for Base64, Base32, and Base16 encoding and decoding, including
|
|
optional case folding and optional alternative alphabets.
|
|
(Contributed by Barry Warsaw.)
|
|
|
|
\item The \module{bisect} module now has an underlying C implementation
|
|
for improved performance.
|
|
(Contributed by Dmitry Vasiliev.)
|
|
|
|
\item The CJKCodecs collections of East Asian codecs, maintained
|
|
by Hye-Shik Chang, was integrated into 2.4.
|
|
The new encodings are:
|
|
|
|
\begin{itemize}
|
|
\item Chinese (PRC): gb2312, gbk, gb18030, big5hkscs, hz
|
|
\item Chinese (ROC): big5, cp950
|
|
\item Japanese: cp932, euc-jis-2004, euc-jp,
|
|
euc-jisx0213, iso-2022-jp, iso-2022-jp-1, iso-2022-jp-2,
|
|
iso-2022-jp-3, iso-2022-jp-ext, iso-2022-jp-2004,
|
|
shift-jis, shift-jisx0213, shift-jis-2004
|
|
\item Korean: cp949, euc-kr, johab, iso-2022-kr
|
|
\end{itemize}
|
|
|
|
\item The UTF-8 and UTF-16 codecs now cope better with receiving partial input.
|
|
Previously the \class{StreamReader} class would try to read more data,
|
|
which made it impossible to resume decoding from the stream. The
|
|
\method{read()} method will now return as much data as it can and future
|
|
calls will resume decoding where previous ones left off.
|
|
(Implemented by Walter D\"orwald.)
|
|
|
|
\item Some other new encodings were added: HP Roman8,
|
|
ISO_8859-11, ISO_8859-16, PCTP-154, and TIS-620.
|
|
|
|
\item There is a new \module{collections} module for
|
|
various specialized collection datatypes.
|
|
Currently it contains just one type, \class{deque},
|
|
a double-ended queue that supports efficiently adding and removing
|
|
elements from either end.
|
|
|
|
\begin{verbatim}
|
|
>>> from collections import deque
|
|
>>> d = deque('ghi') # make a new deque with three items
|
|
>>> d.append('j') # add a new entry to the right side
|
|
>>> d.appendleft('f') # add a new entry to the left side
|
|
>>> d # show the representation of the deque
|
|
deque(['f', 'g', 'h', 'i', 'j'])
|
|
>>> d.pop() # return and remove the rightmost item
|
|
'j'
|
|
>>> d.popleft() # return and remove the leftmost item
|
|
'f'
|
|
>>> list(d) # list the contents of the deque
|
|
['g', 'h', 'i']
|
|
>>> 'h' in d # search the deque
|
|
True
|
|
\end{verbatim}
|
|
|
|
Several modules now take advantage of \class{collections.deque} for
|
|
improved performance, such as the \module{Queue} and
|
|
\module{threading} modules. (Contributed by Raymond Hettinger.)
|
|
|
|
\item The \module{ConfigParser} classes have been enhanced slightly.
|
|
The \method{read()} method now returns a list of the files that
|
|
were successfully parsed, and the \method{set()} method raises
|
|
\exception{TypeError} if passed a \var{value} argument that isn't a
|
|
string.
|
|
|
|
\item The \module{curses} module now supports the ncurses extension
|
|
\function{use_default_colors()}. On platforms where the terminal
|
|
supports transparency, this makes it possible to use a transparent
|
|
background. (Contributed by J\"org Lehmann.)
|
|
|
|
\item The \module{difflib} module now includes an \class{HtmlDiff} class
|
|
that creates an HTML table showing a side by side comparison
|
|
of two versions of a text. (Contributed by Dan Gass.)
|
|
|
|
\item The \module{email} package was updated to version 3.0,
|
|
which dropped various deprecated APIs and removes support for Python
|
|
versions earlier than 2.3. The 3.0 version of the package uses a new
|
|
incremental parser for MIME message, available in the
|
|
\module{email.FeedParser} module. The new parser doesn't require
|
|
reading the entire message into memory, and doesn't throw exceptions
|
|
if a message is malformed; instead it records any problems as a
|
|
\member{defect} attribute of the message. (Developed by Anthony
|
|
Baxter, Barry Warsaw, Thomas Wouters, and others.)
|
|
|
|
\item The \module{heapq} module has been converted to C. The resulting
|
|
tenfold improvement in speed makes the module suitable for handling
|
|
high volumes of data. In addition, the module has two new functions
|
|
\function{nlargest()} and \function{nsmallest()} that use heaps to
|
|
find the N largest or smallest values in a dataset without the
|
|
expense of a full sort. (Contributed by Raymond Hettinger.)
|
|
|
|
\item The \module{httplib} module now contains constants for HTTP
|
|
status codes defined in various HTTP-related RFC documents. Constants
|
|
have names such as \constant{OK}, \constant{CREATED},
|
|
\constant{CONTINUE}, and \constant{MOVED_PERMANENTLY}; use pydoc to
|
|
get a full list. (Contributed by Andrew Eland.)
|
|
|
|
\item The \module{imaplib} module now supports IMAP's THREAD command
|
|
(contributed by Yves Dionne) and new \method{deleteacl()} and
|
|
\method{myrights()} methods (contributed by Arnaud Mazin).
|
|
|
|
\item The \module{itertools} module gained a
|
|
\function{groupby(\var{iterable}\optional{, \var{func}})} function.
|
|
\var{iterable} returns a succession of elements, and the optional
|
|
\var{func} is a function that takes an element and returns a key
|
|
value; if omitted, the key is simply the element itself.
|
|
\function{groupby()} then groups the elements into subsequences
|
|
which have matching values of the key, and returns a series of 2-tuples
|
|
containing the key value and an iterator over the subsequence.
|
|
|
|
Here's an example. The \var{key} function simply returns whether a
|
|
number is even or odd, so the result of \function{groupby()} is to
|
|
return consecutive runs of odd or even numbers.
|
|
|
|
\begin{verbatim}
|
|
>>> import itertools
|
|
>>> L = [2,4,6, 7,8,9,11, 12, 14]
|
|
>>> for key_val, it in itertools.groupby(L, lambda x: x % 2):
|
|
... print key_val, list(it)
|
|
...
|
|
0 [2, 4, 6]
|
|
1 [7]
|
|
0 [8]
|
|
1 [9, 11]
|
|
0 [12, 14]
|
|
>>>
|
|
\end{verbatim}
|
|
|
|
\function{groupby()} is typically used with sorted input. The logic
|
|
for \function{groupby()} is similar to the \UNIX{} \code{uniq} filter
|
|
which makes it handy for eliminating, counting, or identifying
|
|
duplicate elements:
|
|
|
|
\begin{verbatim}
|
|
>>> word = 'abracadabra'
|
|
>>> letters = sorted(word) # Turn string into a sorted list of letters
|
|
>>> letters
|
|
['a', 'a', 'a', 'a', 'a', 'b', 'b', 'c', 'd', 'r', 'r']
|
|
>>> for k, g in itertools.groupby(letters):
|
|
... print k, list(g)
|
|
...
|
|
a ['a', 'a', 'a', 'a', 'a']
|
|
b ['b', 'b']
|
|
c ['c']
|
|
d ['d']
|
|
r ['r', 'r']
|
|
>>> # List unique letters
|
|
>>> [k for k, g in groupby(letters)]
|
|
['a', 'b', 'c', 'd', 'r']
|
|
>>> # Count letter occurrences
|
|
>>> [(k, len(list(g))) for k, g in groupby(letters)]
|
|
[('a', 5), ('b', 2), ('c', 1), ('d', 1), ('r', 2)]
|
|
\end{verbatim}
|
|
|
|
(Contributed by Hye-Shik Chang.)
|
|
|
|
\item \module{itertools} also gained a function named
|
|
\function{tee(\var{iterator}, \var{N})} that returns \var{N} independent
|
|
iterators that replicate \var{iterator}. If \var{N} is omitted, the
|
|
default is 2.
|
|
|
|
\begin{verbatim}
|
|
>>> L = [1,2,3]
|
|
>>> i1, i2 = itertools.tee(L)
|
|
>>> i1,i2
|
|
(<itertools.tee object at 0x402c2080>, <itertools.tee object at 0x402c2090>)
|
|
>>> list(i1) # Run the first iterator to exhaustion
|
|
[1, 2, 3]
|
|
>>> list(i2) # Run the second iterator to exhaustion
|
|
[1, 2, 3]
|
|
>\end{verbatim}
|
|
|
|
Note that \function{tee()} has to keep copies of the values returned
|
|
by the iterator; in the worst case, it may need to keep all of them.
|
|
This should therefore be used carefully if the leading iterator
|
|
can run far ahead of the trailing iterator in a long stream of inputs.
|
|
If the separation is large, then you might as well use
|
|
\function{list()} instead. When the iterators track closely with one
|
|
another, \function{tee()} is ideal. Possible applications include
|
|
bookmarking, windowing, or lookahead iterators.
|
|
(Contributed by Raymond Hettinger.)
|
|
|
|
\item A number of functions were added to the \module{locale}
|
|
module, such as \function{bind_textdomain_codeset()} to specify a
|
|
particular encoding, and a family of \function{l*gettext()} functions
|
|
that return messages in the chosen encoding.
|
|
(Contributed by Gustavo Niemeyer.)
|
|
|
|
\item The \module{logging} package's \function{basicConfig} function
|
|
gained some keyword arguments to simplify log configuration. The
|
|
default behavior is to log messages to standard error, but
|
|
various keyword arguments can be specified to log to a particular file,
|
|
change the logging format, or set the logging level. For example:
|
|
|
|
\begin{verbatim}
|
|
import logging
|
|
logging.basicConfig(filename = '/var/log/application.log',
|
|
level=0, # Log all messages, including debugging,
|
|
format='%(levelname):%(process):%(thread):%(message)')
|
|
\end{verbatim}
|
|
|
|
Other additions to \module{logging} include a \method{log(\var{level},
|
|
\var{msg})} convenience method, and a
|
|
\class{TimedRotatingFileHandler} class that rotates its log files at
|
|
a timed interval. The module already had \class{RotatingFileHandler},
|
|
which rotated logs once the file exceeded a certain size. Both
|
|
classes derive from a new \class{BaseRotatingHandler} class that can
|
|
be used to implement other rotating handlers.
|
|
|
|
(Changes implemented by Vinay Sajip.)
|
|
|
|
\item The \module{marshal} module now shares interned strings on unpacking a
|
|
data structure. This may shrink the size of certain pickle strings,
|
|
but the primary effect is to make \file{.pyc} files significantly smaller.
|
|
(Contributed by Martin von Loewis.)
|
|
|
|
\item The \module{nntplib} module's \class{NNTP} class gained
|
|
\method{description()} and \method{descriptions()} methods to retrieve
|
|
newsgroup descriptions for a single group or for a range of groups.
|
|
(Contributed by J\"urgen A. Erhard.)
|
|
|
|
\item The \module{operator} module gained two new functions,
|
|
\function{attrgetter(\var{attr})} and \function{itemgetter(\var{index})}.
|
|
Both functions return callables that take a single argument and return
|
|
the corresponding attribute or item; these callables make excellent
|
|
data extractors when used with \function{map()} or
|
|
\function{sorted()}. For example:
|
|
|
|
\begin{verbatim}
|
|
>>> L = [('c', 2), ('d', 1), ('a', 4), ('b', 3)]
|
|
>>> map(operator.itemgetter(0), L)
|
|
['c', 'd', 'a', 'b']
|
|
>>> map(operator.itemgetter(1), L)
|
|
[2, 1, 4, 3]
|
|
>>> sorted(L, key=operator.itemgetter(1)) # Sort list by second tuple item
|
|
[('d', 1), ('c', 2), ('b', 3), ('a', 4)]
|
|
\end{verbatim}
|
|
|
|
(Contributed by Raymond Hettinger.)
|
|
|
|
\item The \module{optparse} module was updated. The module now passes
|
|
its messages through \function{gettext.gettext()}, making it possible
|
|
to internationalize Optik's help and error messages. Help messages
|
|
for options can now include the string \code{'\%default'}, which will
|
|
be replaced by the option's default value.
|
|
|
|
\item The long-term plan is to deprecate the \module{rfc822} module
|
|
in some future Python release in favor of the \module{email} package.
|
|
To this end, the \function{email.Utils.formatdate()} function has been
|
|
changed to make it usable as a replacement for
|
|
\function{rfc822.formatdate()}. You may want to write new e-mail
|
|
processing code with this in mind. (Change implemented by Anthony
|
|
Baxter.)
|
|
|
|
\item A new \function{urandom(\var{n})} function
|
|
was added to the \module{os} module, providing access to
|
|
platform-specific sources of randomness such as
|
|
\file{/dev/urandom} on Linux or the Windows CryptoAPI. The
|
|
function returns a string containing \var{n} bytes of random data.
|
|
(Contributed by Trevor Perrin.)
|
|
|
|
\item Another new function: \function{os.path.lexists(\var{path})}
|
|
returns true if the file specified by \var{path} exists, whether or
|
|
not it's a symbolic link. This differs from the existing
|
|
\function{os.path.exists(\var{path})} function, which returns false if
|
|
\var{path} is a symlink that points to a destination that doesn't exist.
|
|
(Contributed by Beni Cherniavsky.)
|
|
|
|
\item A new \function{getsid()} function was added to the
|
|
\module{posix} module that underlies the \module{os} module.
|
|
(Contributed by J. Raynor.)
|
|
|
|
\item The \module{poplib} module now supports POP over SSL.
|
|
|
|
\item The \module{profile} module can now profile C extension functions.
|
|
% XXX more to say about this?
|
|
(Contributed by Nick Bastin.)
|
|
|
|
\item The \module{random} module has a new method called \method{getrandbits(N)}
|
|
which returns an N-bit long integer. This method supports the existing
|
|
\method{randrange()} method, making it possible to efficiently generate
|
|
arbitrarily large random numbers.
|
|
(Contributed by Raymond Hettinger.)
|
|
|
|
\item The regular expression language accepted by the \module{re} module
|
|
was extended with simple conditional expressions, written as
|
|
\regexp{(?(\var{group})\var{A}|\var{B})}. \var{group} is either a
|
|
numeric group ID or a group name defined with \regexp{(?P<group>...)}
|
|
earlier in the expression. If the specified group matched, the
|
|
regular expression pattern \var{A} will be tested against the string; if
|
|
the group didn't match, the pattern \var{B} will be used instead.
|
|
|
|
\item The \module{re} module is also no longer recursive, thanks
|
|
to a massive amount of work by Gustavo Niemeyer. In a recursive
|
|
regular expression engine, certain patterns result in a large amount
|
|
of C stack space being consumed, and it was possible to overflow the
|
|
stack. For example, if you matched a 30000-byte string of \samp{a}
|
|
characters against the expression \regexp{(a|b)+}, one stack frame was
|
|
consumed per character. Python 2.3 tried to check for stack overflow
|
|
and raise a \exception{RuntimeError} exception, but if you were
|
|
unlucky Python could dump core. Python 2.4's regular expression
|
|
engine can match this pattern without problems.
|
|
|
|
\item A new \function{socketpair()} function was added to the
|
|
\module{socket} module, returning a pair of connected sockets.
|
|
(Contributed by Dave Cole.)
|
|
|
|
\item The \function{sys.exitfunc()} function has been deprecated. Code
|
|
should be using the existing \module{atexit} module, which correctly
|
|
handles calling multiple exit functions. Eventually
|
|
\function{sys.exitfunc()} will become a purely internal interface,
|
|
accessed only by \module{atexit}.
|
|
|
|
\item The \module{tarfile} module now generates GNU-format tar files
|
|
by default.
|
|
|
|
\item The \module{threading} module now has an elegantly simple way to support
|
|
thread-local data. The module contains a \class{local} class whose
|
|
attribute values are local to different threads.
|
|
|
|
\begin{verbatim}
|
|
import threading
|
|
|
|
data = threading.local()
|
|
data.number = 42
|
|
data.url = ('www.python.org', 80)
|
|
\end{verbatim}
|
|
|
|
Other threads can assign and retrieve their own values for the
|
|
\member{number} and \member{url} attributes. You can subclass
|
|
\class{local} to initialize attributes or to add methods.
|
|
(Contributed by Jim Fulton.)
|
|
|
|
\item The \module{timeit} module now automatically disables periodic
|
|
garbarge collection during the timing loop. This change makes
|
|
consecutive timings more comparable. (Contributed by Raymond Hettinger.)
|
|
|
|
\item The \module{weakref} module now supports a wider variety of objects
|
|
including Python functions, class instances, sets, frozensets, deques,
|
|
arrays, files, sockets, and regular expression pattern objects.
|
|
(Contributed by Raymond Hettinger.)
|
|
|
|
\item The \module{xmlrpclib} module now supports a multi-call extension for
|
|
transmitting multiple XML-RPC calls in a single HTTP operation.
|
|
|
|
\item The \module{mpz}, \module{rotor}, and \module{xreadlines} modules have
|
|
been removed.
|
|
|
|
\end{itemize}
|
|
|
|
|
|
%======================================================================
|
|
% whole new modules get described in subsections here
|
|
|
|
\subsection{cookielib}
|
|
|
|
The \module{cookielib} library supports client-side handling for HTTP
|
|
cookies, just as the \module{Cookie} provides server-side cookie
|
|
support in CGI scripts. Cookies are stored in cookie jars; the library
|
|
transparently stores cookies offered by the web server in the cookie
|
|
jar, and fetches the cookie from the jar when connecting to the
|
|
server. Similar to web browsers, policy objects control whether
|
|
cookies are accepted or not.
|
|
|
|
In order to store cookies across sessions, two implementations of
|
|
cookie jars are provided: one that stores cookies in the Netscape
|
|
format, so applications can use the Mozilla or Lynx cookie jars, and
|
|
one that stores cookies in the same format as the Perl libwww libary.
|
|
|
|
\module{urllib2} has been changed to interact with \module{cookielib}:
|
|
\class{HTTPCookieProcessor} manages a cookie jar that is used when
|
|
accessing URLs.
|
|
|
|
\subsection{doctest}
|
|
|
|
The \module{doctest} module underwent considerable refactoring thanks
|
|
to Edward Loper and Tim Peters.
|
|
|
|
% XXX describe this
|
|
|
|
% ======================================================================
|
|
\section{Build and C API Changes}
|
|
|
|
Changes to Python's build process and to the C API include:
|
|
|
|
\begin{itemize}
|
|
|
|
\item Three new convenience macros were added for common return
|
|
values from extension functions: \csimplemacro{Py_RETURN_NONE},
|
|
\csimplemacro{Py_RETURN_TRUE}, and \csimplemacro{Py_RETURN_FALSE}.
|
|
(Contributed by Brett Cannon.)
|
|
|
|
\item Another new macro, \csimplemacro{Py_CLEAR(\var{obj})},
|
|
decreases the reference count of \var{obj} and sets \var{obj} to the
|
|
null pointer. (Contributed by Jim Fulton.)
|
|
|
|
\item A new function, \cfunction{PyTuple_Pack(\var{N}, \var{obj1},
|
|
\var{obj2}, ..., \var{objN})}, constructs tuples from a variable
|
|
length argument list of Python objects. (Contributed by Raymond Hettinger.)
|
|
|
|
\item A new function, \cfunction{PyDict_Contains(\var{d}, \var{k})},
|
|
implements fast dictionary lookups without masking exceptions raised
|
|
during the look-up process. (Contributed by Raymond Hettinger.)
|
|
|
|
\item The \csimplemacro{Py_IS_NAN(\var{X})} macro returns 1 if
|
|
its float or double argument \var{X} is a NaN.
|
|
(Contributed by Tim Peters.)
|
|
|
|
\item C code can avoid unnecessary locking by using the new
|
|
\cfunction{PyEval_ThreadsInitialized()} function to tell
|
|
if any thread operations have been performed. If this function
|
|
returns false, no lock operations are needed.
|
|
(Contributed by Nick Coghlan.)
|
|
|
|
\item A new function, \cfunction{PyArg_VaParseTupleAndKeywords()},
|
|
is the same as \cfunction{PyArg_ParseTupleAndKeywords()} but takes a
|
|
\ctype{va_list} instead of a number of arguments.
|
|
(Contributed by Greg Chapman.)
|
|
|
|
\item A new method flag, \constant{METH_COEXISTS}, allows a function
|
|
defined in slots to co-exist with a \ctype{PyCFunction} having the
|
|
same name. This can halve the access time for a method such as
|
|
\method{set.__contains__()}. (Contributed by Raymond Hettinger.)
|
|
|
|
\item Python can now be built with additional profiling for the
|
|
interpreter itself. This is intended for people developing on the
|
|
Python core. Providing \longprogramopt{--enable-profiling} to the
|
|
\program{configure} script will let you profile the interpreter with
|
|
\program{gprof}, and providing the \longprogramopt{--with-tsc}
|
|
switch enables profiling using the Pentium's Time-Stamp-Counter
|
|
register. The switch is slightly misnamed, because the profiling
|
|
feature also works on the PowerPC platform, though that processor
|
|
architecture doesn't call that register a TSC.
|
|
|
|
\item The \ctype{tracebackobject} type has been renamed to \ctype{PyTracebackObject}.
|
|
|
|
\end{itemize}
|
|
|
|
|
|
%======================================================================
|
|
\subsection{Port-Specific Changes}
|
|
|
|
\begin{itemize}
|
|
|
|
\item The Windows port now builds under MSVC++ 7.1 as well as version 6.
|
|
(Contributed by Martin von Loewis.)
|
|
|
|
\end{itemize}
|
|
|
|
|
|
|
|
%======================================================================
|
|
\section{Porting to Python 2.4}
|
|
|
|
This section lists previously described changes that may require
|
|
changes to your code:
|
|
|
|
\begin{itemize}
|
|
|
|
\item The \function{zip()} built-in function and \function{itertools.izip()}
|
|
now return an empty list instead of raising a \exception{TypeError}
|
|
exception if called with no arguments.
|
|
(Contributed by Raymond Hettinger.)
|
|
|
|
\item \function{dircache.listdir()} now passes exceptions to the caller
|
|
instead of returning empty lists.
|
|
|
|
\item \function{LexicalHandler.startDTD()} used to receive the public and
|
|
system IDs in the wrong order. This has been corrected; applications
|
|
relying on the wrong order need to be fixed.
|
|
|
|
\item \function{fcntl.ioctl} now warns if the \var{mutate}
|
|
argument is omitted and relevant.
|
|
|
|
\item The \module{tarfile} module now generates GNU-format tar files
|
|
by default.
|
|
|
|
\end{itemize}
|
|
|
|
|
|
%======================================================================
|
|
\section{Acknowledgements \label{acks}}
|
|
|
|
The author would like to thank the following people for offering
|
|
suggestions, corrections and assistance with various drafts of this
|
|
article: Hye-Shik Chang, Michael Dyck, Raymond Hettinger, Hamish Lawson,
|
|
Fredrik Lundh.
|
|
|
|
\end{document}
|