\documentclass{howto} % $Id$ \title{What's New in Python 2.1} \release{0.04} \author{A.M. Kuchling} \authoraddress{\email{amk1@bigfoot.com}} \begin{document} \maketitle\tableofcontents \section{Introduction} {\large This document is a draft, and is subject to change until Python 2.1 is released. Please send any comments, bug reports, or questions, no matter how minor, to \email{amk1@bigfoot.com}. } It's that time again... time for a new Python release, version 2.1. One recent goal of the Python development team has been to accelerate the pace of new releases, with a new release coming every 6 to 9 months. 2.1 is the first release to come out at this faster pace, with the first alpha appearing in January, 3 months after the final version of 2.0 was released. This article explains the new features in 2.1. While there aren't as many changes in 2.1 as there were in Python 2.0, there are still some pleasant surprises in store. 2.1 is the first release to be steered through the use of Python Enhancement Proposals, or PEPs, so most of the sizable changes have accompanying PEPs that provide more complete documentation and a design rationale for the change. This article doesn't attempt to document the new features completely, but simply provides an overview of the new features for Python programmers. Refer to the Python 2.1 documentation, or to the specific PEP, for more details about any new feature that particularly interests you. Currently 2.1 is available in an alpha release, but the release schedule calls for a beta release by late February 2001, and a final release in April 2001. % ====================================================================== \section{PEP 232: Function Attributes} In Python 2.1, functions can now have arbitrary information attached to them. People were often using docstrings to hold information about functions and methods, because the \code{__doc__} attribute was the only way of attaching any information to a function. For example, in the Zope Web application server, functions are marked as safe for public access by having a docstring, and in John Aycock's SPARK parsing framework, docstrings hold parts of the BNF grammar to be parsed. This overloading is unfortunate, since docstrings are really intended to hold a function's documentation, and it means you can't properly document functions intended for private use in Zope. Attributes can now be set and retrieved on functions, using the regular Python syntax: \begin{verbatim} def f(): pass f.publish = 1 f.secure = 1 f.grammar = "A ::= B (C D)*" \end{verbatim} The dictionary containing attributes can be accessed as \member{__dict__}. Unlike the \member{__dict__} attribute of class instances, in functions you can actually assign a new dictionary to \member{__dict__}, though the new value is restricted to a regular Python dictionary; you can't be tricky and set it to a \class{UserDict} instance, a DBM file, or any other random mapping object. \begin{seealso} \seepep{232}{Function Attributes}{Written and implemented by Barry Warsaw.} \end{seealso} % ====================================================================== \section{PEP 207: Rich Comparisons} In earlier versions, Python's support for implementing comparisons on user-defined classes and extension types was quite simple. Classes could implement a \method{__cmp__} method that was given two instances of a class, and could only return 0 if they were equal or +1 or -1 if they weren't; the method couldn't raise an exception or return anything other than a Boolean value. Users of Numeric Python often found this model too weak and restrictive, because in the number-crunching programs that numeric Python is used for, it would be more useful to be able to perform elementwise comparisons of two matrices, returning a matrix containing the results of a given comparison for each element. If the two matrices are of different sizes, then the compare has to be able to raise an exception to signal the error. In Python 2.1, rich comparisons were added in order to support this need. Python classes can now individually overload each of the \code{<}, \code{<=}, \code{>}, \code{>=}, \code{==}, and \code{!=} operations. The new magic method names are: \begin{tableii}{c|l}{code}{Operation}{Method name} \lineii{<}{\method{__lt__}} \lineii{<=}{\method{__le__}} \lineii{>}{\method{__gt__}} \lineii{>=}{\method{__ge__}} \lineii{==}{\method{__eq__}} \lineii{!=}{\method{__ne__}} \end{tableii} (The magic methods are named after the corresponding Fortran operators \code{.LT.}. \code{.LE.}, \&c. Numeric programmers are almost certainly quite familar with these names and will find them easy to remember.) Each of these magic methods is of the form \code{\var{method}(self, other)}, where \code{self} will be the object on the left-hand side of the operator, while \code{other} will be the object on the right-hand side. For example, the expression \code{A < B} will cause \code{A.__lt__(B)} to be called. Each of these magic methods can return anything at all: a Boolean, a matrix, a list, or any other Python object. Alternatively they can raise an exception if the comparison is impossible, inconsistent, or otherwise meaningless. The built-in \function{cmp(A,B)} function can use the rich comparison machinery, and now accepts an optional argument specifying which comparison operation to use; this is given as one of the strings \code{"<"}, \code{"<="}, \code{">"}, \code{">="}, \code{"=="}, or \code{"!="}. If called without the optional third argument, \function{cmp()} will only return -1, 0, or +1 as in previous versions of Python; otherwise it will call the appropriate method and can return any Python object. There are also corresponding changes of interest to C programmers; there's a new slot \code{tp_richcmp} in type objects and an API for performing a given rich comparison. I won't cover the C API here, but will refer you to PEP 207, or the documentation for Python's C API, for the full list of related functions. \begin{seealso} \seepep{207}{Rich Comparisions}{Written by Guido van Rossum, heavily based on earlier work by David Ascher, and implemented by Guido van Rossum.} \end{seealso} % ====================================================================== \section{PEP 230: Warning Framework} Over its 10 years of existence, Python has accumulated a certain number of obsolete modules and features along the way. It's difficult to know when a feature is safe to remove, since there's no way of knowing how much code uses it --- perhaps no programs depend on the feature, or perhaps many do. To enable removing old features in a more structured way, a warning framework was added. When the Python developers want to get rid of a feature, it will first trigger a warning in the next version of Python. The following Python version can then drop the feature, and users will have had a full release cycle to remove uses of the old feature. Python 2.1 adds the warning framework to be used in this scheme. It adds a \module{warnings} module that provide functions to issue warnings, and to filter out warnings that you don't want to be displayed. Third-party modules can also use this framework to deprecate old features that they no longer wish to support. For example, in Python 2.1 the \module{regex} module is deprecated, so importing it causes a warning to be printed: \begin{verbatim} >>> import regex __main__:1: DeprecationWarning: the regex module is deprecated; please use the re module >>> \end{verbatim} Warnings can be issued by calling the \function{warnings.warn} function: \begin{verbatim} warnings.warn("feature X no longer supported") \end{verbatim} The first parameter is the warning message; an additional optional parameters can be used to specify a particular warning category. Filters can be added to disable certain warnings; a regular expression pattern can be applied to the message or to the module name in order to suppress a warning. For example, you may have a program that uses the \module{regex} module and not want to spare the time to convert it to use the \module{re} module right now. The warning can be suppressed by calling \begin{verbatim} import warnings warnings.filterwarnings(action = 'ignore', message='.*regex module is deprecated', category=DeprecationWarning, module = '__main__') \end{verbatim} This adds a filter that will apply only to warnings of the class \class{DeprecationWarning} triggered in the \module{__main__} module, and applies a regular expression to only match the message about the \module{regex} module being deprecated, and will cause such warnings to be ignored. Warnings can also be printed only once, printed every time the offending code is executed, or turned into exceptions that will cause the program to stop (unless the exceptions are caught in the usual way, of course). Functions were also added to Python's C API for issuing warnings; refer to PEP 230 or to Python's API documentation for the details. \begin{seealso} \seepep{5}{Guidelines for Language Evolution}{Written by Paul Prescod, to specify procedures to be followed when removing old features from Python. The policy described in this PEP hasn't been officially adopted, but the eventual policy probably won't be too different from Prescod's proposal.} \seepep{230}{Warning Framework}{Written and implemented by Guido van Rossum.} \end{seealso} % ====================================================================== \section{PEP 229: New Build System} When compiling Python, the user had to go in and edit the \file{Modules/Setup} file in order to enable various additional modules; the default set is relatively small and limited to modules that compile on most Unix platforms. This means that on Unix platforms with many more features, most notably Linux, Python installations often don't contain all useful modules they could. Python 2.0 added the Distutils, a set of modules for distributing and installing extensions. In Python 2.1, the Distutils are used to compile much of the standard library of extension modules, autodetecting which ones are supported on the current machine. It's hoped that this will make Python installations easier and more featureful. Instead of having to edit the \file{Modules/Setup} file in order to enable modules, a \file{setup.py} script in the top directory of the Python source distribution is run at build time, and attempts to discover which modules can be enabled by examining the modules and header files on the system. In 2.1alpha1, there's very little you can do to change \file{setup.py}'s behaviour, or to discover why a given module isn't compiled. If you run into problems in 2.1alpha1, please report them, and be prepared to dive into \file{setup.py} in order to fix autodetection of a given library on your system. In the alpha2 release I plan to add ways to have more control over what the script does (probably command-line arguments to \file{configure} or to \file{setup.py}). If it turns out to be impossible to make autodetection work reliably, it's possible that this change may become an optional build method instead of the default, or it may even be backed out completely. In another far-reaching change to the build mechanism, Neil Schemenauer restructured things so Python now uses a single makefile that isn't recursive, instead of makefiles in the top directory and in each of the Python/, Parser/, Objects/, and Modules/ subdirectories. This makes building Python faster, and also makes the build process clearer and simpler. \begin{seealso} \seepep{229}{Using Distutils to Build Python}{Written and implemented by A.M. Kuchling.} \end{seealso} % ====================================================================== \section{PEP 217: Interactive Display Hook} When using the Python interpreter interactively, the output of commands is displayed using the built-in \function{repr()} function. In Python 2.1, the variable \module{sys.displayhook} can be set to a callable object which will be called instead of \function{repr()}. For example, you can set it to a special pretty-printing function: \begin{verbatim} >>> # Create a recursive data structure ... L = [1,2,3] >>> L.append(L) >>> L # Show Python's default output [1, 2, 3, [...]] >>> # Use pprint.pprint() as the display function ... import sys, pprint >>> sys.displayhook = pprint.pprint >>> L [1, 2, 3, ] >>> \end{verbatim} \begin{seealso} \seepep{217}{Display Hook for Interactive Use}{Written and implemented by Moshe Zadka.} \end{seealso} % ====================================================================== \section{PEP 208: New Coercion Model} How numeric coercion is done at the C level was significantly modified. This will only affect the authors of C extensions to Python, allowing them more flexibility in writing extension types that support numeric operations. Extension types can now set the type flag \code{Py_TPFLAGS_CHECKTYPES} in their \code{PyTypeObject} structure to indicate that they support the new coercion model. In such extension types, the numeric slot functions can no longer assume that they'll be passed two arguments of the same type; instead they may be passed two arguments of differing types, and can then perform their own internal coercion. If the slot function is passed a type it can't handle, it can indicate the failure by returning a reference to the \code{Py_NotImplemented} singleton value. The numeric functions of the other type will then be tried, and perhaps they can handle the operation; if the other type also returns \code{Py_NotImplemented}, then a \exception{TypeError} will be raised. Numeric methods written in Python can also return \code{Py_NotImplemented}, causing the interpreter to act as if the method did not exist (perhaps raising a \exception{TypeError}, perhaps trying another object's numeric methods). \begin{seealso} \seepep{208}{Reworking the Coercion Model}{Written and implemented by Neil Schemenauer, heavily based upon earlier work by Marc-Andr\'e Lemburg. Read this to understand the fine points of how numeric operations will now be processed at the C level.} \end{seealso} % ====================================================================== \section{Minor Changes and Fixes} There were relatively few smaller changes made in Python 2.1 due to the shorter release cycle. A search through the CVS change logs turns up 57 patches applied, and 86 bugs fixed; both figures are likely to be underestimates. Some of the more notable changes are: \begin{itemize} \item The speed of line-oriented file I/O has been improved because people often complain about its lack of speed, and because it's often been used as a na\"ive benchmark. The \method{readline()} method of file objects has therefore been rewritten to be much faster. The exact amount of the speedup will vary from platform to platform depending on how slow the C library's \function{getc()} was, but is around 66\%, and potentially much faster on some particular operating systems. Tim Peters did much of the benchmarking and coding for this change, motivated by a discussion in comp.lang.python. A new module and method for file objects was also added, contributed by Jeff Epler. The new method, \method{xreadlines()}, is similar to the existing \function{xrange()} built-in. \function{xreadlines()} returns an opaque sequence object that only supports being iterated over, reading a line on every iteration but not reading the entire file into memory as the existing \method{readlines()} method does. You'd use it like this: \begin{verbatim} for line in sys.stdin.xreadlines(): # ... do something for each line ... ... \end{verbatim} For a fuller discussion of the line I/O changes, see the python-dev summary for January 1-15, 2001. \item A new method, \method{popitem()}, was added to dictionaries to enable destructively iterating through the contents of a dictionary; this can be faster for large dictionaries because . \code{D.popitem()} removes a random \code{(\var{key}, \var{value})} pair from the dictionary and returns it as a 2-tuple. This was implemented mostly by Tim Peters and Guido van Rossum, after a suggestion and preliminary patch by Moshe Zadka. % Not checked into CVS yet -- only proposed %\item The \operator{in} operator now works for dictionaries %XXX 'if key in dict' now works. (Thomas Wouters) \item \module{curses.panel}, a wrapper for the panel library, part of ncurses and of SYSV curses, was contributed by Thomas Gellekum. The panel library provides windows with the additional feature of depth. Windows can be moved higher or lower in the depth ordering, and the panel library figures out where panels overlap and which sections are visible. \item Modules can now control which names are imported when \code{from \var{module} import *} is used, by defining an \code{__all__} attribute containing a list of names that will be imported. One common complaint is that if the module imports other modules such as \module{sys} or \module{string}, \code{from \var{module} import *} will add them to the importing module's namespace. To fix this, simply list the public names in \code{__all__}: \begin{verbatim} # List public names __all__ = ['Database', 'open'] \end{verbatim} A stricter version of this patch was first suggested and implemented by Ben Wolfson, but after some python-dev discussion, a weaker final version was checked in. \item The PyXML package has gone through a few releases since Python 2.0, and Python 2.1 includes an updated version of the \module{xml} package. Some of the noteworthy changes include support for Expat 1.2, the ability for Expat parsers to handle files in any encoding supported by Python, and various bugfixes for SAX, DOM, and the \module{minidom} module. \item Various functions in the \module{time} module, such as \function{asctime()} and \function{localtime()}, require a floating point argument containing the time in seconds since the epoch. The most common use of these functions is to work with the current time, so the floating point argument has been made optional; when a value isn't provided, the current time will be used. For example, log file entries usually need a string containing the current time; in Python 2.1, \code{time.asctime()} can be used, instead of the lengthier \code{time.asctime(time.localtime(time.time()))} that was previously required. This change was proposed and implemented by Thomas Wouters. \item XXX Characters in repr() of strings now use hex escapes, and use \n,\t,\r for those characters (Ka-Ping Yee) \item The \module{ftplib} module now defaults to retrieving files in passive mode, because passive mode is more likely to work from behind a firewall. This request came from the Debian bug tracking system, since other Debian packages use \module{ftplib} to retrieve files and then don't work from behind a firewall. It's deemed unlikely that this will cause problems for anyone, because Netscape defaults to passive mode and few people complain, but if passive mode is unsuitable for your application or network setup, call \method{set_pasv(0)} on FTP objects to disable passive mode. \item The size of the Unicode character database was compressed by another 340K thanks to Fredrik Lundh. \end{itemize} And there's the usual list of bugfixes, minor memory leaks, docstring edits, and other tweaks, too lengthy to be worth itemizing; see the CVS logs for the full details if you want them. % ====================================================================== \section{Nested Scopes} % XXX The PEP for this new feature hasn't been completed yet, and the requisite changes haven't been checked into CVS yet. \begin{seealso} \seepep{227}{Statically Nested Scopes}{Written and implemented by Jeremy Hylton.} \end{seealso} % ====================================================================== \section{Weak References} % XXX The PEP for this new feature hasn't been completed yet, and the requisite changes haven't been checked into CVS yet. \begin{seealso} \seepep{205}{Weak References}{Written and implemented by Fred L. Drake, Jr.} \end{seealso} % ====================================================================== \section{Acknowledgements} The author would like to thank the following people for offering suggestions on various drafts of this article: Graeme Cross, David Goodger, Jay Graves, Michael Hudson, Marc-Andr\'e Lemburg, Fredrik Lundh, Neil Schemenauer, Thomas Wouters. \end{document}