Various edits

This commit is contained in:
Andrew M. Kuchling 2004-07-05 01:37:07 +00:00
parent 3b7909160e
commit 3bf85f1ae8
1 changed files with 82 additions and 75 deletions

View File

@ -21,14 +21,15 @@
\maketitle
\tableofcontents
This article explains the new features in Python 2.4 alpha1, to be released in early July 2004 The final version of Python 2.4
is expected to be around September 2004.
This article explains the new features in Python 2.4 alpha1, scheduled
for release in early July 2004. The final version of Python 2.4 is
expected to be released around September 2004.
Python 2.4 is a middle-sized release. It doesn't introduce as many
Python 2.4 is a medium-sized release. It doesn't introduce as many
changes as the radical Python 2.2, but introduces more features than
the conservative 2.3 release did. The most significant new language
feature (as of this writing) is the addition of generator expressions;
most of the changes are to the standard library.
most other changes are to the standard library.
This article doesn't attempt to provide a complete specification of
every single new feature, but instead provides a convenient overview.
@ -43,11 +44,13 @@ documentation.
%======================================================================
\section{PEP 218: Built-In Set Objects}
Two new built-in types, \function{set(\var{iterable})} and
\function{frozenset(\var{iterable})} provide high speed data types for
membership testing, for eliminating duplicates from sequences, and
for mathematical operations like unions, intersections, differences,
and symmetric differences.
Python 2.3 introduced the \module{sets} module. C implementations of
set data types have now been added to the Python core as two new
built-in types, \function{set(\var{iterable})} and
\function{frozenset(\var{iterable})}. They provide high speed
operations for membership testing, for eliminating duplicates from
sequences, and for mathematical operations like unions, intersections,
differences, and symmetric differences.
\begin{verbatim}
>>> a = set('abracadabra') # form a set from a string
@ -77,16 +80,13 @@ set(['a', 'c', 'b', 'd', 'r', 'w', 'y', 'x', 'z'])
set(['a', 'c', 'b', 'd', 'r', 'w', 'y', 'z'])
\end{verbatim}
The type \function{frozenset()} is an immutable version of \function{set()}.
The \function{frozenset} type is an immutable version of \function{set}.
Since it is immutable and hashable, it may be used as a dictionary key or
as a member of another set. Accordingly, it does not have methods
like \method{add()} and \method{remove()} which could alter its contents.
as a member of another set.
% XXX what happens to the sets module?
% The current thinking is that the sets module will be left alone.
% That way, existing code will continue to run without alteration.
% Also, the module provides an autoconversion feature not supported by set()
% and frozenset().
The \module{sets} module remains in the standard library, and may be
useful if you wish to subclass the \class{Set} or \class{ImmutableSet}
classes. There are currently no plans to deprecate the module.
\begin{seealso}
\seepep{218}{Adding a Built-In Set Object Type}{Originally proposed by
@ -96,23 +96,22 @@ Greg Wilson and ultimately implemented by Raymond Hettinger.}
%======================================================================
\section{PEP 237: Unifying Long Integers and Integers}
The lengthy transition process for the PEP, begun with Python 2.2,
The lengthy transition process for this PEP, begun in Python 2.2,
takes another step forward in Python 2.4. In 2.3, certain integer
operations that would behave differently after int/long unification
triggered \exception{FutureWarning} warnings and returned values
limited to 32 or 64 bits. In 2.4, these expressions no longer produce
a warning, but they now produce a different value that's a long
integer.
limited to 32 or 64 bits (depending on your platform). In 2.4, these
expressions no longer produce a warning and instead produce a
different result that's usually a long integer.
The problematic expressions are primarily left shifts and lengthy
hexadecimal and octal constants. For example, \code{2 << 32} is one
expression that results in a warning in 2.3, evaluating to 0 on 32-bit
platforms. In Python 2.4, this expression now returns 8589934592.
hexadecimal and octal constants. For example, \code{2 << 32} results
in a warning in 2.3, evaluating to 0 on 32-bit platforms. In Python
2.4, this expression now returns the correct answer, 8589934592.
\begin{seealso}
\seepep{237}{Unifying Long Integers and Integers}{Original PEP
written by Moshe Zadka and Gvr. The changes for 2.4 were implemented by
written by Moshe Zadka and GvR. The changes for 2.4 were implemented by
Kalle Svensson.}
\end{seealso}
@ -124,9 +123,12 @@ programs that loop through large data sets without having the entire
data set in memory at one time. Programmers can use iterators and the
\module{itertools} module to write code in a fairly functional style.
The fly in the ointment has been list comprehensions, because they
% XXX avoid metaphor
List comprehensions have been the fly in the ointment because they
produce a Python list object containing all of the items, unavoidably
pulling them all into memory. When trying to write a program using the functional approach, it would be natural to write something like:
pulling them all into memory. When trying to write a
functionally-styled program, it would be natural to write something
like:
\begin{verbatim}
links = [link for link in get_all_links() if not link.followed]
@ -166,12 +168,12 @@ passed to a function you could write:
print sum(obj.count for obj in list_all_objects())
\end{verbatim}
There are some small differences from list comprehensions. Most
notably, the loop variable (\var{obj} in the above example) is not
accessible outside of the generator expression. List comprehensions
leave the variable assigned to its last value; future versions of
Python will change this, making list comprehensions match generator
expressions in this respect.
Generator expressions differ from list comprehensions in various small
ways. Most notably, the loop variable (\var{obj} in the above
example) is not accessible outside of the generator expression. List
comprehensions leave the variable assigned to its last value; future
versions of Python will change this, making list comprehensions match
generator expressions in this respect.
\begin{seealso}
\seepep{289}{Generator Expressions}{Proposed by Raymond Hettinger and
@ -182,7 +184,7 @@ implemented by Jiwon Seo with early efforts steered by Hye-Shik Chang.}
\section{PEP 322: Reverse Iteration}
A new built-in function, \function{reversed(\var{seq})}, takes a sequence
and returns an iterator that returns the elements of the sequence
and returns an iterator that loops over the elements of the sequence
in reverse order.
\begin{verbatim}
@ -194,8 +196,9 @@ in reverse order.
1
\end{verbatim}
Compared to extended slicing, \code{range(1,4)[::-1]}, \function{reversed()}
is easier to read, runs faster, and uses substantially less memory.
Compared to extended slicing, such as \code{range(1,4)[::-1]},
\function{reversed()} is easier to read, runs faster, and uses
substantially less memory.
Note that \function{reversed()} only accepts sequences, not arbitrary
iterators. If you want to reverse an iterator, first convert it to
@ -450,7 +453,7 @@ language.
argument forms as the \class{dict} constructor. This includes any
mapping, any iterable of key/value pairs, and keyword arguments.
\item The string methods, \method{ljust()}, \method{rjust()}, and
\item The string methods \method{ljust()}, \method{rjust()}, and
\method{center()} now take an optional argument for specifying a
fill character other than a space.
@ -466,7 +469,7 @@ the string.
\end{verbatim}
\item The \method{sort()} method of lists gained three keyword
arguments, \var{cmp}, \var{key}, and \var{reverse}. These arguments
arguments: \var{cmp}, \var{key}, and \var{reverse}. These arguments
make some common usages of \method{sort()} simpler. All are optional.
\var{cmp} is the same as the previous single argument to
@ -496,7 +499,7 @@ The last example, which uses the \var{cmp} parameter, is the old way
to perform a case-insensitive sort. It works but is slower than
using a \var{key} parameter. Using \var{key} results in calling the
\method{lower()} method once for each element in the list while using
\var{cmp} will call the method twice for each comparison.
\var{cmp} will call it twice for each comparison.
For simple key functions and comparison functions, it is often
possible to avoid a \keyword{lambda} expression by using an unbound
@ -509,10 +512,11 @@ coded as:
['A', 'b', 'c', 'D']
\end{verbatim}
The \var{reverse} parameter should have a Boolean value. If the value is
\constant{True}, the list will be sorted into reverse order. Instead
of \code{L.sort(lambda x,y: cmp(y.score, x.score))}, you can now write:
\code{L.sort(key = lambda x: x.score, reverse=True)}.
The \var{reverse} parameter should have a Boolean value. If the value
is \constant{True}, the list will be sorted into reverse order.
Instead of \code{L.sort(lambda x,y: cmp(x.score, y.score)) ;
L.reverse()}, you can now write: \code{L.sort(key = lambda x: x.score,
reverse=True)}.
The results of sorting are now guaranteed to be stable. This means
that two entries with equal keys will be returned in the same order as
@ -522,7 +526,7 @@ people with the same age are in name-sorted order.
\item There is a new built-in function
\function{sorted(\var{iterable})} that works like the in-place
\method{list.sort()} method but has been made suitable for use in
\method{list.sort()} method but can be used in
expressions. The differences are:
\begin{itemize}
\item the input may be any iterable;
@ -550,7 +554,6 @@ blue 2
green 3
red 1
yellow 5
\end{verbatim}
\item The \function{eval(\var{expr}, \var{globals}, \var{locals})}
@ -558,8 +561,9 @@ function now accepts any mapping type for the \var{locals} argument.
Previously this had to be a regular Python dictionary.
\item The \function{zip()} built-in function and \function{itertools.izip()}
now return an empty list instead of raising a \exception{TypeError}
exception if called with no arguments. This makes them more
now return an empty list if called with no arguments.
Previously they raised a \exception{TypeError}
exception. This makes them more
suitable for use with variable length argument lists:
\begin{verbatim}
@ -580,28 +584,24 @@ Previously this had to be a regular Python dictionary.
\begin{itemize}
\item The inner loops for \class{list} and \class{tuple} slicing
\item The inner loops for list and tupleslicing
were optimized and now run about one-third faster. The inner
loops were also optimized for \class{dict} with performance
loops were also optimized for dictionaries with performance
boosts to \method{keys()}, \method{values()}, \method{items()},
\method{iterkeys()}, \method{itervalues()}, and \method{iteritems()}.
\item The machinery for growing and shrinking lists was optimized
for speed and for space efficiency. Small lists (under eight elements)
never over-allocate by more than three elements. Large lists do not
over-allocate by more than 1/8th. Appending and popping from lists
now runs faster due to more efficient code paths and less frequent
use of the underlying system realloc(). List comprehensions also
benefit. The amount of improvement varies between systems and shows
the greatest improvement on systems with poor realloc() implementations.
\method{list.extend()} was also optimized and no longer converts its
argument into a temporary list prior to extending the base list.
\item The machinery for growing and shrinking lists was optimized for
speed and for space efficiency. Appending and popping from lists now
runs faster due to more efficient code paths and less frequent use of
the underlying system \cfunction{realloc()}. List comprehensions
also benefit. \method{list.extend()} was also optimized and no
longer converts its argument into a temporary list before extending
the base list.
\item \function{list()}, \function{tuple()}, \function{map()},
\function{filter()}, and \function{zip()} now run several times
faster with non-sequence arguments that supply a \method{__len__()}
method. Previously, the pre-sizing optimization only applied to
sequence arguments.
method.
\item The methods \method{list.__getitem__()},
\method{dict.__getitem__()}, and \method{dict.__contains__()} are
@ -685,8 +685,8 @@ True
\end{verbatim}
Several modules now take advantage of \class{collections.deque} for
improved performance: \module{Queue}, \module{mutex}, \module{shlex}
\module{threading}, and \module{pydoc}.
improved performance, such as the \module{Queue} and
\module{threading} modules.
\item The \module{ConfigParser} classes have been enhanced slightly.
The \method{read()} method now returns a list of the files that
@ -705,8 +705,7 @@ improved performance: \module{Queue}, \module{mutex}, \module{shlex}
(Contributed by Yves Dionne.)
\item The \module{itertools} module gained a
\function{groupby(\var{iterable}\optional{, \var{func}})} function,
inspired by the GROUP BY clause from SQL.
\function{groupby(\var{iterable}\optional{, \var{func}})} function.
\var{iterable} returns a succession of elements, and the optional
\var{func} is a function that takes an element and returns a key
value; if omitted, the key is simply the element itself.
@ -732,22 +731,30 @@ return consecutive runs of odd or even numbers.
>>>
\end{verbatim}
Like its SQL counterpart, \function{groupby()} is typically used with
sorted input. The logic for \function{groupby()} is similar to the
\UNIX{} \code{uniq} filter which makes it handy for eliminating,
counting, or identifying duplicate elements:
\function{groupby()} is typically used with sorted input. The logic
for \function{groupby()} is similar to the \UNIX{} \code{uniq} filter
which makes it handy for eliminating, counting, or identifying
duplicate elements:
\begin{verbatim}
>>> word = 'abracadabra'
>>> letters = sorted(word) # Turn string into a sorted list of letters
>>> letters
['a', 'a', 'a', 'a', 'a', 'b', 'b', 'c', 'd', 'r', 'r']
>>> [k for k, g in groupby(letters)] # List unique letters
>>> for k, g in itertools.groupby(letters):
... print k, list(g)
...
a ['a', 'a', 'a', 'a', 'a']
b ['b', 'b']
c ['c']
d ['d']
r ['r', 'r']
>>> # List unique letters
>>> [k for k, g in groupby(letters)]
['a', 'b', 'c', 'd', 'r']
>>> [(k, len(list(g))) for k, g in groupby(letters)] # Count letter occurences
>>> # Count letter occurences
>>> [(k, len(list(g))) for k, g in groupby(letters)]
[('a', 5), ('b', 2), ('c', 1), ('d', 1), ('r', 2)]
>>> [k for k, g in groupby(letters) if len(list(g)) > 1] # List duplicated letters
['a', 'b', 'r']
\end{verbatim}
\item \module{itertools} also gained a function named
@ -770,7 +777,7 @@ Note that \function{tee()} has to keep copies of the values returned
by the iterator; in the worst case, it may need to keep all of them.
This should therefore be used carefully if the leading iterator
can run far ahead of the trailing iterator in a long stream of inputs.
If the separation is large, then it becomes preferable to use
If the separation is large, then you might as well use
\function{list()} instead. When the iterators track closely with one
another, \function{tee()} is ideal. Possible applications include
bookmarking, windowing, or lookahead iterators.