Delete the LaTeX doc tree.

This commit is contained in:
Georg Brandl 2007-08-15 14:26:55 +00:00
parent af62d9abfb
commit f56181ff53
513 changed files with 0 additions and 154687 deletions

202
Doc/ACKS
View File

@ -1,202 +0,0 @@
Contributors to the Python Documentation
----------------------------------------
This file lists people who have contributed in some way to the Python
documentation. It is probably not complete -- if you feel that you or
anyone else should be on this list, please let us know (send email to
docs@python.org), and we'll be glad to correct the problem.
It is only with the input and contributions of the Python community
that Python has such wonderful documentation -- Thank You!
In the official sources, this file is encoded in ISO-8859-1 (Latin-1).
-Fred
Aahz
Michael Abbott
Steve Alexander
Jim Ahlstrom
Fred Allen
A. Amoroso
Pehr Anderson
Oliver Andrich
Jesús Cea Avión
Daniel Barclay
Chris Barker
Don Bashford
Anthony Baxter
Bennett Benson
Jonathan Black
Robin Boerdijk
Michal Bozon
Aaron Brancotti
Keith Briggs
Lee Busby
Lorenzo M. Catucci
Mauro Cicognini
Gilles Civario
Mike Clarkson
Steve Clift
Dave Cole
Matthew Cowles
Jeremy Craven
Andrew Dalke
Ben Darnell
L. Peter Deutsch
Robert Donohue
Fred L. Drake, Jr.
Jeff Epler
Michael Ernst
Blame Andy Eskilsson
Carey Evans
Martijn Faassen
Carl Feynman
Hernán Martínez Foffani
Stefan Franke
Jim Fulton
Peter Funk
Lele Gaifax
Matthew Gallagher
Ben Gertzfield
Nadim Ghaznavi
Jonathan Giddy
Shelley Gooch
Nathaniel Gray
Grant Griffin
Thomas Guettler
Anders Hammarquist
Mark Hammond
Harald Hanche-Olsen
Manus Hand
Gerhard Häring
Travis B. Hartwell
Janko Hauser
Bernhard Herzog
Magnus L. Hetland
Konrad Hinsen
Stefan Hoffmeister
Albert Hofkamp
Gregor Hoffleit
Steve Holden
Thomas Holenstein
Gerrit Holl
Rob Hooft
Brian Hooper
Randall Hopper
Michael Hudson
Eric Huss
Jeremy Hylton
Roger Irwin
Jack Jansen
Philip H. Jensen
Pedro Diaz Jimenez
Kent Johnson
Lucas de Jonge
Andreas Jung
Robert Kern
Jim Kerr
Jan Kim
Greg Kochanski
Guido Kollerie
Peter A. Koren
Daniel Kozan
Andrew M. Kuchling
Dave Kuhlman
Erno Kuusela
Detlef Lannert
Piers Lauder
Glyph Lefkowitz
Marc-André Lemburg
Ulf A. Lindgren
Everett Lipman
Mirko Liss
Martin von Löwis
Fredrik Lundh
Jeff MacDonald
John Machin
Andrew MacIntyre
Vladimir Marangozov
Vincent Marchetti
Laura Matson
Daniel May
Doug Mennella
Paolo Milani
Skip Montanaro
Paul Moore
Ross Moore
Sjoerd Mullender
Dale Nagata
Ng Pheng Siong
Koray Oner
Tomas Oppelstrup
Denis S. Otkidach
Zooko O'Whielacronx
William Park
Joonas Paalasmaa
Harri Pasanen
Bo Peng
Tim Peters
Christopher Petrilli
Justin D. Pettit
Chris Phoenix
François Pinard
Paul Prescod
Eric S. Raymond
Edward K. Ream
Sean Reifschneider
Bernhard Reiter
Armin Rigo
Wes Rishel
Jim Roskind
Guido van Rossum
Donald Wallace Rouse II
Nick Russo
Chris Ryland
Constantina S.
Hugh Sasse
Bob Savage
Scott Schram
Neil Schemenauer
Barry Scott
Joakim Sernbrant
Justin Sheehy
Michael Simcich
Ionel Simionescu
Gregory P. Smith
Roy Smith
Clay Spence
Nicholas Spies
Tage Stabell-Kulo
Frank Stajano
Anthony Starks
Greg Stein
Peter Stoehr
Mark Summerfield
Reuben Sumner
Kalle Svensson
Jim Tittsler
Ville Vainio
Martijn Vries
Charles G. Waldman
Greg Ward
Barry Warsaw
Corran Webster
Glyn Webster
Bob Weiner
Eddy Welbourne
Mats Wichmann
Gerry Wiener
Timothy Wild
Collin Winter
Blake Winton
Dan Wolfe
Steven Work
Thomas Wouters
Ka-Ping Yee
Rory Yorke
Moshe Zadka
Milan Zamazal
Cheng Zhang

View File

@ -1,736 +0,0 @@
# Makefile for Python documentation
# ---------------------------------
#
# See also the README file.
#
# This is a bit of a mess. The documents are identified by short names:
# api -- Python/C API Reference Manual
# doc -- Documenting Python
# ext -- Extending and Embedding the Python Interpreter
# lib -- Library Reference Manual
# mac -- Macintosh Library Modules
# ref -- Python Reference Manual
# tut -- Python Tutorial
# inst -- Installing Python Modules
# dist -- Distributing Python Modules
#
# The LaTeX sources for each of these documents are in subdirectories
# with the three-letter designations above as the directory names.
#
# The main target creates HTML for each of the documents. You can
# also do "make lib" (etc.) to create the HTML versions of individual
# documents.
#
# The document classes and styles are in the texinputs/ directory.
# These define a number of macros that are similar in name and intent
# as macros in Texinfo (e.g. \code{...} and \emph{...}), as well as a
# number of environments for formatting function and data definitions.
# Documentation for the macros is included in "Documenting Python"; see
# http://www.python.org/doc/current/doc/doc.html, or the sources for
# this document in the doc/ directory.
#
# Everything is processed by LaTeX. See the file `README' for more
# information on the tools needed for processing.
#
# There's a problem with generating the index which has been solved by
# a sed command applied to the index file. The shell script fix_hack
# does this (the Makefile takes care of calling it).
#
# Additional targets attempt to convert selected LaTeX sources to
# various other formats. These are generally site specific because
# the tools used are all but universal. These targets are:
#
# ps -- convert all documents from LaTeX to PostScript
# pdf -- convert all documents from LaTeX to the
# Portable Document Format
#
# See the README file for more information on these targets.
#
# The formatted output is located in subdirectories. For PDF and
# PostScript, look in the paper-$(PAPER)/ directory. For HTML, look in
# the html/ directory. If you want to fix the GNU info process, look
# in the info/ directory; please send patches to docs@python.org.
# This Makefile only includes information on how to perform builds; for
# dependency information, see Makefile.deps.
# Customization -- you *may* have to edit this
# You could set this to a4:
PAPER=letter
# Ideally, you shouldn't need to edit beyond this point
INFODIR= info
TOOLSDIR= tools
# This is the *documentation* release, and is used to construct the
# file names of the downloadable tarballs. It is initialized by the
# getversioninfo script to ensure that the right version number is
# used; the script will also write commontex/patchlevel.tex if that
# doesn't exist or needs to be changed. Documents which depend on the
# version number should use \input{patchlevel} and include
# commontex/patchlevel.tex in their dependencies.
RELEASE=$(shell $(PYTHON) tools/getversioninfo)
PYTHON= python
DVIPS= dvips -N0 -t $(PAPER)
# This is ugly! The issue here is that there are two different levels
# in the directory tree at which we execute mkhowto, so we can't
# define it just once using a relative path (at least not with the
# current implementation and Makefile structure). We use the GNUish
# $(shell) function here to work around that restriction by
# identifying mkhowto and the commontex/ directory using absolute paths.
#
# If your doc build fails immediately, you may need to switch to GNU make.
# (e.g. OpenBSD needs package gmake installed; use gmake instead of make)
PWD=$(shell pwd)
# (The trailing colon in the value is needed; TeX places its default
# set of paths at the location of the empty string in the path list.)
TEXINPUTS=$(PWD)/commontex:
# The mkhowto script can be run from the checkout using the first
# version of this variable definition, or from a preferred version
# using the second version. The standard documentation is typically
# built using the second flavor, where the preferred version is from
# the Python CVS trunk.
MKHOWTO= TEXINPUTS=$(TEXINPUTS) $(PYTHON) $(PWD)/tools/mkhowto
MKDVI= $(MKHOWTO) --paper=$(PAPER) --dvi
MKHTML= $(MKHOWTO) --html --about html/stdabout.dat \
--iconserver ../icons --favicon ../icons/pyfav.png \
--address $(PYTHONDOCS) --up-link ../index.html \
--up-title "Python Documentation Index" \
--global-module-index "../modindex.html" --dvips-safe
MKISILOHTML=$(MKHOWTO) --html --about html/stdabout.dat \
--iconserver ../icons \
--l2h-init perl/isilo.perl --numeric --split 1 \
--dvips-safe
MKISILO= iSilo386 -U -y -rCR -d0
MKPDF= $(MKHOWTO) --paper=$(PAPER) --pdf
MKPS= $(MKHOWTO) --paper=$(PAPER) --ps
BUILDINDEX=$(TOOLSDIR)/buildindex.py
PYTHONDOCS="See <i><a href=\"about.html\">About this document...</a></i> for information on suggesting changes."
HTMLBASE= file:`pwd`
# The emacs binary used to build the info docs. GNU Emacs 21 is required.
EMACS= emacs
# The end of this should reflect the major/minor version numbers of
# the release:
WHATSNEW=whatsnew26
# what's what
MANDVIFILES= paper-$(PAPER)/api.dvi paper-$(PAPER)/ext.dvi \
paper-$(PAPER)/lib.dvi paper-$(PAPER)/mac.dvi \
paper-$(PAPER)/ref.dvi paper-$(PAPER)/tut.dvi
HOWTODVIFILES= paper-$(PAPER)/doc.dvi paper-$(PAPER)/inst.dvi \
paper-$(PAPER)/dist.dvi paper-$(PAPER)/$(WHATSNEW).dvi
MANPDFFILES= paper-$(PAPER)/api.pdf paper-$(PAPER)/ext.pdf \
paper-$(PAPER)/lib.pdf paper-$(PAPER)/mac.pdf \
paper-$(PAPER)/ref.pdf paper-$(PAPER)/tut.pdf
HOWTOPDFFILES= paper-$(PAPER)/doc.pdf paper-$(PAPER)/inst.pdf \
paper-$(PAPER)/dist.pdf paper-$(PAPER)/$(WHATSNEW).pdf
MANPSFILES= paper-$(PAPER)/api.ps paper-$(PAPER)/ext.ps \
paper-$(PAPER)/lib.ps paper-$(PAPER)/mac.ps \
paper-$(PAPER)/ref.ps paper-$(PAPER)/tut.ps
HOWTOPSFILES= paper-$(PAPER)/doc.ps paper-$(PAPER)/inst.ps \
paper-$(PAPER)/dist.ps paper-$(PAPER)/$(WHATSNEW).ps
DVIFILES= $(MANDVIFILES) $(HOWTODVIFILES)
PDFFILES= $(MANPDFFILES) $(HOWTOPDFFILES)
PSFILES= $(MANPSFILES) $(HOWTOPSFILES)
HTMLCSSFILES=html/api/api.css \
html/doc/doc.css \
html/ext/ext.css \
html/lib/lib.css \
html/mac/mac.css \
html/ref/ref.css \
html/tut/tut.css \
html/inst/inst.css \
html/dist/dist.css
ISILOCSSFILES=isilo/api/api.css \
isilo/doc/doc.css \
isilo/ext/ext.css \
isilo/lib/lib.css \
isilo/mac/mac.css \
isilo/ref/ref.css \
isilo/tut/tut.css \
isilo/inst/inst.css \
isilo/dist/dist.css
ALLCSSFILES=$(HTMLCSSFILES) $(ISILOCSSFILES)
INDEXFILES=html/api/api.html \
html/doc/doc.html \
html/ext/ext.html \
html/lib/lib.html \
html/mac/mac.html \
html/ref/ref.html \
html/tut/tut.html \
html/inst/inst.html \
html/dist/dist.html \
html/whatsnew/$(WHATSNEW).html
ALLHTMLFILES=$(INDEXFILES) html/index.html html/modindex.html html/acks.html
COMMONPERL= perl/manual.perl perl/python.perl perl/l2hinit.perl
ANNOAPI=api/refcounts.dat tools/anno-api.py
include Makefile.deps
# These must be declared phony since there
# are directories with matching names:
.PHONY: api doc ext lib mac ref tut inst dist
.PHONY: html info isilo
# Main target
default: html
all: html dvi ps pdf isilo
dvi: $(DVIFILES)
pdf: $(PDFFILES)
ps: $(PSFILES)
world: ps pdf html distfiles
# Rules to build PostScript and PDF formats
.SUFFIXES: .dvi .ps
.dvi.ps:
$(DVIPS) -o $@ $<
# Targets for each document:
# Python/C API Reference Manual
paper-$(PAPER)/api.dvi: $(ANNOAPIFILES)
cd paper-$(PAPER) && $(MKDVI) api.tex
paper-$(PAPER)/api.pdf: $(ANNOAPIFILES)
cd paper-$(PAPER) && $(MKPDF) api.tex
paper-$(PAPER)/api.tex: api/api.tex
cp api/api.tex $@
paper-$(PAPER)/abstract.tex: api/abstract.tex $(ANNOAPI)
$(PYTHON) $(TOOLSDIR)/anno-api.py -o $@ api/abstract.tex
paper-$(PAPER)/concrete.tex: api/concrete.tex $(ANNOAPI)
$(PYTHON) $(TOOLSDIR)/anno-api.py -o $@ api/concrete.tex
paper-$(PAPER)/exceptions.tex: api/exceptions.tex $(ANNOAPI)
$(PYTHON) $(TOOLSDIR)/anno-api.py -o $@ api/exceptions.tex
paper-$(PAPER)/init.tex: api/init.tex $(ANNOAPI)
$(PYTHON) $(TOOLSDIR)/anno-api.py -o $@ api/init.tex
paper-$(PAPER)/intro.tex: api/intro.tex
cp api/intro.tex $@
paper-$(PAPER)/memory.tex: api/memory.tex $(ANNOAPI)
$(PYTHON) $(TOOLSDIR)/anno-api.py -o $@ api/memory.tex
paper-$(PAPER)/newtypes.tex: api/newtypes.tex $(ANNOAPI)
$(PYTHON) $(TOOLSDIR)/anno-api.py -o $@ api/newtypes.tex
paper-$(PAPER)/refcounting.tex: api/refcounting.tex $(ANNOAPI)
$(PYTHON) $(TOOLSDIR)/anno-api.py -o $@ api/refcounting.tex
paper-$(PAPER)/utilities.tex: api/utilities.tex $(ANNOAPI)
$(PYTHON) $(TOOLSDIR)/anno-api.py -o $@ api/utilities.tex
paper-$(PAPER)/veryhigh.tex: api/veryhigh.tex $(ANNOAPI)
$(PYTHON) $(TOOLSDIR)/anno-api.py -o $@ api/veryhigh.tex
# Distributing Python Modules
paper-$(PAPER)/dist.dvi: $(DISTFILES)
cd paper-$(PAPER) && $(MKDVI) ../dist/dist.tex
paper-$(PAPER)/dist.pdf: $(DISTFILES)
cd paper-$(PAPER) && $(MKPDF) ../dist/dist.tex
# Documenting Python
paper-$(PAPER)/doc.dvi: $(DOCFILES)
cd paper-$(PAPER) && $(MKDVI) ../doc/doc.tex
paper-$(PAPER)/doc.pdf: $(DOCFILES)
cd paper-$(PAPER) && $(MKPDF) ../doc/doc.tex
# Extending and Embedding the Python Interpreter
paper-$(PAPER)/ext.dvi: $(EXTFILES)
cd paper-$(PAPER) && $(MKDVI) ../ext/ext.tex
paper-$(PAPER)/ext.pdf: $(EXTFILES)
cd paper-$(PAPER) && $(MKPDF) ../ext/ext.tex
# Installing Python Modules
paper-$(PAPER)/inst.dvi: $(INSTFILES)
cd paper-$(PAPER) && $(MKDVI) ../inst/inst.tex
paper-$(PAPER)/inst.pdf: $(INSTFILES)
cd paper-$(PAPER) && $(MKPDF) ../inst/inst.tex
# Python Library Reference
paper-$(PAPER)/lib.dvi: $(LIBFILES)
cd paper-$(PAPER) && $(MKDVI) ../lib/lib.tex
paper-$(PAPER)/lib.pdf: $(LIBFILES)
cd paper-$(PAPER) && $(MKPDF) ../lib/lib.tex
# Macintosh Library Modules
paper-$(PAPER)/mac.dvi: $(MACFILES)
cd paper-$(PAPER) && $(MKDVI) ../mac/mac.tex
paper-$(PAPER)/mac.pdf: $(MACFILES)
cd paper-$(PAPER) && $(MKPDF) ../mac/mac.tex
# Python Reference Manual
paper-$(PAPER)/ref.dvi: $(REFFILES)
cd paper-$(PAPER) && $(MKDVI) ../ref/ref.tex
paper-$(PAPER)/ref.pdf: $(REFFILES)
cd paper-$(PAPER) && $(MKPDF) ../ref/ref.tex
# Python Tutorial
paper-$(PAPER)/tut.dvi: $(TUTFILES)
cd paper-$(PAPER) && $(MKDVI) ../tut/tut.tex
paper-$(PAPER)/tut.pdf: $(TUTFILES)
cd paper-$(PAPER) && $(MKPDF) ../tut/tut.tex
# What's New in Python X.Y
paper-$(PAPER)/$(WHATSNEW).dvi: whatsnew/$(WHATSNEW).tex
cd paper-$(PAPER) && $(MKDVI) ../whatsnew/$(WHATSNEW).tex
paper-$(PAPER)/$(WHATSNEW).pdf: whatsnew/$(WHATSNEW).tex
cd paper-$(PAPER) && $(MKPDF) ../whatsnew/$(WHATSNEW).tex
# The remaining part of the Makefile is concerned with various
# conversions, as described above. See also the README file.
info:
cd $(INFODIR) && $(MAKE) EMACS=$(EMACS) WHATSNEW=$(WHATSNEW)
# Targets to convert the manuals to HTML using Nikos Drakos' LaTeX to
# HTML converter. For more info on this program, see
# <URL:http://cbl.leeds.ac.uk/nikos/tex2html/doc/latex2html/latex2html.html>.
# Note that LaTeX2HTML inserts references to an icons directory in
# each page that it generates. I have placed a copy of this directory
# in the distribution to simplify the process of creating a
# self-contained HTML distribution; for this purpose I have also added
# a (trivial) index.html. Change the definition of $ICONSERVER in
# perl/l2hinit.perl to use a different location for the icons directory.
# If you have the standard LaTeX2HTML icons installed, the versions shipped
# with this documentation should be stored in a separate directory and used
# instead. The standard set does *not* include all the icons used in the
# Python documentation.
$(ALLCSSFILES): html/style.css
cp $< $@
$(INDEXFILES): $(COMMONPERL) html/stdabout.dat tools/node2label.pl
html/acks.html: ACKS $(TOOLSDIR)/support.py $(TOOLSDIR)/mkackshtml
$(PYTHON) $(TOOLSDIR)/mkackshtml --address $(PYTHONDOCS) \
--favicon icons/pyfav.png \
--output html/acks.html <ACKS
# html/index.html is dependent on $(INDEXFILES) since we want the date
# on the front index to be updated whenever any of the child documents
# are updated and boilerplate.tex uses \today as the date. The index
# files are not used to actually generate content.
BOILERPLATE=commontex/boilerplate.tex
html/index.html: $(INDEXFILES)
html/index.html: html/index.html.in $(BOILERPLATE) tools/rewrite.py
$(PYTHON) tools/rewrite.py $(BOILERPLATE) \
RELEASE=$(RELEASE) WHATSNEW=$(WHATSNEW) \
<$< >$@
html/modindex.html: $(TOOLSDIR)/support.py $(TOOLSDIR)/mkmodindex
html/modindex.html: html/dist/dist.html
html/modindex.html: html/lib/lib.html html/mac/mac.html
cd html && \
$(PYTHON) ../$(TOOLSDIR)/mkmodindex --columns 3 \
--output modindex.html --address $(PYTHONDOCS) \
--favicon icons/pyfav.png \
dist/modindex.html \
lib/modindex.html mac/modindex.html
html: $(ALLHTMLFILES) $(HTMLCSSFILES)
api: html/api/api.html html/api/api.css
html/api/api.html: $(APIFILES) api/refcounts.dat
$(MKHTML) --dir html/api api/api.tex
doc: html/doc/doc.html html/doc/doc.css
html/doc/doc.html: $(DOCFILES)
$(MKHTML) --dir html/doc doc/doc.tex
ext: html/ext/ext.html html/ext/ext.css
html/ext/ext.html: $(EXTFILES)
$(MKHTML) --dir html/ext ext/ext.tex
lib: html/lib/lib.html html/lib/lib.css
html/lib/lib.html: $(LIBFILES)
$(MKHTML) --dir html/lib lib/lib.tex
mac: html/mac/mac.html html/mac/mac.css
html/mac/mac.html: $(MACFILES)
$(MKHTML) --dir html/mac mac/mac.tex
ref: html/ref/ref.html html/ref/ref.css
html/ref/ref.html: $(REFFILES)
$(MKHTML) --dir html/ref ref/ref.tex
tut: html/tut/tut.html html/tut/tut.css
html/tut/tut.html: $(TUTFILES)
$(MKHTML) --dir html/tut --numeric --split 3 tut/tut.tex
inst: html/inst/inst.html html/inst/inst.css
html/inst/inst.html: $(INSTFILES) perl/distutils.perl
$(MKHTML) --dir html/inst --split 4 inst/inst.tex
dist: html/dist/dist.html html/dist/dist.css
html/dist/dist.html: $(DISTFILES) perl/distutils.perl
$(MKHTML) --dir html/dist --split 4 dist/dist.tex
whatsnew: html/whatsnew/$(WHATSNEW).html
html/whatsnew/$(WHATSNEW).html: whatsnew/$(WHATSNEW).tex
$(MKHTML) --dir html/whatsnew --split 4 whatsnew/$(WHATSNEW).tex
# The iSilo format is used by the iSilo document reader for PalmOS devices.
ISILOINDEXFILES=isilo/api/api.html \
isilo/doc/doc.html \
isilo/ext/ext.html \
isilo/lib/lib.html \
isilo/mac/mac.html \
isilo/ref/ref.html \
isilo/tut/tut.html \
isilo/inst/inst.html \
isilo/dist/dist.html \
isilo/whatsnew/$(WHATSNEW).html
$(ISILOINDEXFILES): $(COMMONPERL) html/stdabout.dat perl/isilo.perl
isilo: isilo/python-api.pdb \
isilo/python-doc.pdb \
isilo/python-ext.pdb \
isilo/python-lib.pdb \
isilo/python-mac.pdb \
isilo/python-ref.pdb \
isilo/python-tut.pdb \
isilo/python-dist.pdb \
isilo/python-inst.pdb \
isilo/python-whatsnew.pdb
isilo/python-api.pdb: isilo/api/api.html isilo/api/api.css
$(MKISILO) "-iPython/C API Reference Manual" \
isilo/api/api.html $@
isilo/python-doc.pdb: isilo/doc/doc.html isilo/doc/doc.css
$(MKISILO) "-iDocumenting Python" \
isilo/doc/doc.html $@
isilo/python-ext.pdb: isilo/ext/ext.html isilo/ext/ext.css
$(MKISILO) "-iExtending & Embedding Python" \
isilo/ext/ext.html $@
isilo/python-lib.pdb: isilo/lib/lib.html isilo/lib/lib.css
$(MKISILO) "-iPython Library Reference" \
isilo/lib/lib.html $@
isilo/python-mac.pdb: isilo/mac/mac.html isilo/mac/mac.css
$(MKISILO) "-iPython/C API Reference Manual" \
isilo/mac/mac.html $@
isilo/python-ref.pdb: isilo/ref/ref.html isilo/ref/ref.css
$(MKISILO) "-iPython Reference Manual" \
isilo/ref/ref.html $@
isilo/python-tut.pdb: isilo/tut/tut.html isilo/tut/tut.css
$(MKISILO) "-iPython Tutorial" \
isilo/tut/tut.html $@
isilo/python-dist.pdb: isilo/dist/dist.html isilo/dist/dist.css
$(MKISILO) "-iDistributing Python Modules" \
isilo/dist/dist.html $@
isilo/python-inst.pdb: isilo/inst/inst.html isilo/inst/inst.css
$(MKISILO) "-iInstalling Python Modules" \
isilo/inst/inst.html $@
isilo/python-whatsnew.pdb: isilo/whatsnew/$(WHATSNEW).html isilo/whatsnew/$(WHATSNEW).css
$(MKISILO) "-iWhat's New in Python X.Y" \
isilo/whatsnew/$(WHATSNEW).html $@
isilo/api/api.html: $(APIFILES) api/refcounts.dat
$(MKISILOHTML) --dir isilo/api api/api.tex
isilo/doc/doc.html: $(DOCFILES)
$(MKISILOHTML) --dir isilo/doc doc/doc.tex
isilo/ext/ext.html: $(EXTFILES)
$(MKISILOHTML) --dir isilo/ext ext/ext.tex
isilo/lib/lib.html: $(LIBFILES)
$(MKISILOHTML) --dir isilo/lib lib/lib.tex
isilo/mac/mac.html: $(MACFILES)
$(MKISILOHTML) --dir isilo/mac mac/mac.tex
isilo/ref/ref.html: $(REFFILES)
$(MKISILOHTML) --dir isilo/ref ref/ref.tex
isilo/tut/tut.html: $(TUTFILES)
$(MKISILOHTML) --dir isilo/tut tut/tut.tex
isilo/inst/inst.html: $(INSTFILES) perl/distutils.perl
$(MKISILOHTML) --dir isilo/inst inst/inst.tex
isilo/dist/dist.html: $(DISTFILES) perl/distutils.perl
$(MKISILOHTML) --dir isilo/dist dist/dist.tex
isilo/whatsnew/$(WHATSNEW).html: whatsnew/$(WHATSNEW).tex
$(MKISILOHTML) --dir isilo/whatsnew whatsnew/$(WHATSNEW).tex
# These are useful if you need to transport the iSilo-ready HTML to
# another machine to perform the conversion:
isilozip: isilo-html-$(RELEASE).zip
isilo-html-$(RELEASE).zip: $(ISILOINDEXFILES)
rm -f $@
cd isilo && \
zip -q -9 ../$@ */*.css */*.html */*.txt
# webchecker needs an extra flag to process the huge index from the libref
WEBCHECKER=$(PYTHON) ../Tools/webchecker/webchecker.py
HTMLBASE= file:`pwd`/html
webcheck: $(ALLHTMLFILES)
$(WEBCHECKER) $(HTMLBASE)/api/
$(WEBCHECKER) $(HTMLBASE)/doc/
$(WEBCHECKER) $(HTMLBASE)/ext/
$(WEBCHECKER) -m290000 $(HTMLBASE)/lib/
$(WEBCHECKER) $(HTMLBASE)/mac/
$(WEBCHECKER) $(HTMLBASE)/ref/
$(WEBCHECKER) $(HTMLBASE)/tut/
$(WEBCHECKER) $(HTMLBASE)/dist/
$(WEBCHECKER) $(HTMLBASE)/inst/
$(WEBCHECKER) $(HTMLBASE)/whatsnew/
fastwebcheck: $(ALLHTMLFILES)
$(WEBCHECKER) -x $(HTMLBASE)/api/
$(WEBCHECKER) -x $(HTMLBASE)/doc/
$(WEBCHECKER) -x $(HTMLBASE)/ext/
$(WEBCHECKER) -x -m290000 $(HTMLBASE)/lib/
$(WEBCHECKER) -x $(HTMLBASE)/mac/
$(WEBCHECKER) -x $(HTMLBASE)/ref/
$(WEBCHECKER) -x $(HTMLBASE)/tut/
$(WEBCHECKER) -x $(HTMLBASE)/dist/
$(WEBCHECKER) -x $(HTMLBASE)/inst/
$(WEBCHECKER) -x $(HTMLBASE)/whatsnew/
# Release packaging targets:
paper-$(PAPER)/README: $(PSFILES) $(TOOLSDIR)/getpagecounts
cd paper-$(PAPER) && ../$(TOOLSDIR)/getpagecounts -r $(RELEASE) >../$@
info-$(RELEASE).tgz: info
cd $(INFODIR) && tar cf - README python.dir python-*.info* \
| gzip -9 >../$@
info-$(RELEASE).tar.bz2: info
cd $(INFODIR) && tar cf - README python.dir python-*.info* \
| bzip2 -9 >../$@
latex-$(RELEASE).tgz:
$(PYTHON) $(TOOLSDIR)/mksourcepkg --gzip $(RELEASE)
latex-$(RELEASE).tar.bz2:
$(PYTHON) $(TOOLSDIR)/mksourcepkg --bzip2 $(RELEASE)
latex-$(RELEASE).zip:
rm -f $@
$(PYTHON) $(TOOLSDIR)/mksourcepkg --zip $(RELEASE)
pdf-$(PAPER)-$(RELEASE).tar: $(PDFFILES)
rm -f $@
mkdir Python-Docs-$(RELEASE)
cp paper-$(PAPER)/*.pdf Python-Docs-$(RELEASE)
tar cf $@ Python-Docs-$(RELEASE)
rm -r Python-Docs-$(RELEASE)
pdf-$(PAPER)-$(RELEASE).tgz: pdf-$(PAPER)-$(RELEASE).tar
gzip -9 <$? >$@
pdf-$(PAPER)-$(RELEASE).tar.bz2: pdf-$(PAPER)-$(RELEASE).tar
bzip2 -9 <$? >$@
pdf-$(PAPER)-$(RELEASE).zip: pdf
rm -f $@
mkdir Python-Docs-$(RELEASE)
cp paper-$(PAPER)/*.pdf Python-Docs-$(RELEASE)
zip -q -r -9 $@ Python-Docs-$(RELEASE)
rm -r Python-Docs-$(RELEASE)
postscript-$(PAPER)-$(RELEASE).tar: $(PSFILES) paper-$(PAPER)/README
rm -f $@
mkdir Python-Docs-$(RELEASE)
cp paper-$(PAPER)/*.ps Python-Docs-$(RELEASE)
cp paper-$(PAPER)/README Python-Docs-$(RELEASE)
tar cf $@ Python-Docs-$(RELEASE)
rm -r Python-Docs-$(RELEASE)
postscript-$(PAPER)-$(RELEASE).tar.bz2: postscript-$(PAPER)-$(RELEASE).tar
bzip2 -9 <$< >$@
postscript-$(PAPER)-$(RELEASE).tgz: postscript-$(PAPER)-$(RELEASE).tar
gzip -9 <$< >$@
postscript-$(PAPER)-$(RELEASE).zip: $(PSFILES) paper-$(PAPER)/README
rm -f $@
mkdir Python-Docs-$(RELEASE)
cp paper-$(PAPER)/*.ps Python-Docs-$(RELEASE)
cp paper-$(PAPER)/README Python-Docs-$(RELEASE)
zip -q -r -9 $@ Python-Docs-$(RELEASE)
rm -r Python-Docs-$(RELEASE)
HTMLPKGFILES=*.html */*.css */*.html */*.gif */*.png */*.txt
html-$(RELEASE).tar: $(ALLHTMLFILES) $(HTMLCSSFILES)
mkdir Python-Docs-$(RELEASE)
-find html -name '*.gif' -size 0 | xargs rm -f
cd html && tar cf ../temp.tar $(HTMLPKGFILES)
cd Python-Docs-$(RELEASE) && tar xf ../temp.tar
rm temp.tar
tar cf html-$(RELEASE).tar Python-Docs-$(RELEASE)
rm -r Python-Docs-$(RELEASE)
html-$(RELEASE).tgz: html-$(RELEASE).tar
gzip -9 <$? >$@
html-$(RELEASE).tar.bz2: html-$(RELEASE).tar
bzip2 -9 <$? >$@
html-$(RELEASE).zip: $(ALLHTMLFILES) $(HTMLCSSFILES)
rm -f $@
mkdir Python-Docs-$(RELEASE)
cd html && tar cf ../temp.tar $(HTMLPKGFILES)
cd Python-Docs-$(RELEASE) && tar xf ../temp.tar
rm temp.tar
zip -q -r -9 $@ Python-Docs-$(RELEASE)
rm -r Python-Docs-$(RELEASE)
isilo-$(RELEASE).zip: isilo
rm -f $@
mkdir Python-Docs-$(RELEASE)
cp isilo/python-*.pdb Python-Docs-$(RELEASE)
zip -q -r -9 $@ Python-Docs-$(RELEASE)
rm -r Python-Docs-$(RELEASE)
# convenience targets:
tarhtml: html-$(RELEASE).tgz
tarinfo: info-$(RELEASE).tgz
tarps: postscript-$(PAPER)-$(RELEASE).tgz
tarpdf: pdf-$(PAPER)-$(RELEASE).tgz
tarlatex: latex-$(RELEASE).tgz
tarballs: tarpdf tarps tarhtml
ziphtml: html-$(RELEASE).zip
zipps: postscript-$(PAPER)-$(RELEASE).zip
zippdf: pdf-$(PAPER)-$(RELEASE).zip
ziplatex: latex-$(RELEASE).zip
zipisilo: isilo-$(RELEASE).zip
zips: zippdf zipps ziphtml
bziphtml: html-$(RELEASE).tar.bz2
bzipinfo: info-$(RELEASE).tar.bz2
bzipps: postscript-$(PAPER)-$(RELEASE).tar.bz2
bzippdf: pdf-$(PAPER)-$(RELEASE).tar.bz2
bziplatex: latex-$(RELEASE).tar.bz2
bzips: bzippdf bzipps bziphtml
disthtml: bziphtml ziphtml
distinfo: bzipinfo
distps: bzipps zipps
distpdf: bzippdf zippdf
distlatex: bziplatex ziplatex
# We use the "pkglist" target at the end of these to ensure the
# package list is updated after building either of these; this seems a
# reasonable compromise between only building it for distfiles or
# having to build it manually. Doing it here allows the packages for
# distribution to be built using either of
# make distfiles && make PAPER=a4 paperdist
# make paperdist && make PAPER=a4 distfiles
# The small amount of additional work is a small price to pay for not
# having to remember which order to do it in. ;)
paperdist: distpdf distps pkglist
edist: disthtml pkglist
# The pkglist.html file is used as part of the download.html page on
# python.org; it is not used as intermediate input here or as part of
# the packages created.
pkglist:
$(TOOLSDIR)/mkpkglist >pkglist.html
distfiles: paperdist edist
$(TOOLSDIR)/mksourcepkg --bzip2 --zip $(RELEASE)
$(TOOLSDIR)/mkpkglist >pkglist.html
# Housekeeping targets
# Remove temporary files; all except the following:
# - sources: .tex, .bib, .sty, *.cls
# - useful results: .dvi, .pdf, .ps, .texi, .info
clean:
rm -f html-$(RELEASE).tar
cd $(INFODIR) && $(MAKE) clean
# Remove temporaries as well as final products
clobber:
rm -f html-$(RELEASE).tar
rm -f html-$(RELEASE).tgz info-$(RELEASE).tgz
rm -f pdf-$(RELEASE).tgz postscript-$(RELEASE).tgz
rm -f latex-$(RELEASE).tgz html-$(RELEASE).zip
rm -f pdf-$(RELEASE).zip postscript-$(RELEASE).zip
rm -f $(DVIFILES) $(PSFILES) $(PDFFILES)
cd $(INFODIR) && $(MAKE) clobber
rm -f paper-$(PAPER)/*.tex paper-$(PAPER)/*.ind paper-$(PAPER)/*.idx
rm -f paper-$(PAPER)/*.l2h paper-$(PAPER)/*.how paper-$(PAPER)/README
rm -rf html/index.html html/modindex.html html/acks.html
rm -rf html/api/ html/doc/ html/ext/ html/lib/ html/mac/
rm -rf html/ref/ html/tut/ html/inst/ html/dist/
rm -rf html/whatsnew/
rm -rf isilo/api/ isilo/doc/ isilo/ext/ isilo/lib/ isilo/mac/
rm -rf isilo/ref/ isilo/tut/ isilo/inst/ isilo/dist/
rm -rf isilo/whatsnew/
rm -f isilo/python-*.pdb isilo-$(RELEASE).zip
realclean distclean: clobber

View File

@ -1,382 +0,0 @@
# LaTeX source dependencies.
COMMONSTYLES= texinputs/python.sty \
texinputs/pypaper.sty
INDEXSTYLES=texinputs/python.ist
COMMONTEX=commontex/copyright.tex \
commontex/license.tex \
commontex/patchlevel.tex \
commontex/boilerplate.tex
MANSTYLES= texinputs/fncychap.sty \
texinputs/manual.cls \
$(COMMONSTYLES)
HOWTOSTYLES= texinputs/howto.cls \
$(COMMONSTYLES)
APIFILES= $(MANSTYLES) $(INDEXSTYLES) $(COMMONTEX) \
api/api.tex \
api/abstract.tex \
api/concrete.tex \
api/exceptions.tex \
api/init.tex \
api/intro.tex \
api/memory.tex \
api/newtypes.tex \
api/refcounting.tex \
api/utilities.tex \
api/veryhigh.tex \
commontex/typestruct.h \
commontex/reportingbugs.tex
# These files are generated from those listed above, and are used to
# generate the typeset versions of the manuals. The list is defined
# here to make it easier to ensure parallelism.
ANNOAPIFILES= $(MANSTYLES) $(INDEXSTYLES) $(COMMONTEX) api/refcounts.dat \
paper-$(PAPER)/api.tex \
paper-$(PAPER)/abstract.tex \
paper-$(PAPER)/concrete.tex \
paper-$(PAPER)/exceptions.tex \
paper-$(PAPER)/init.tex \
paper-$(PAPER)/intro.tex \
paper-$(PAPER)/memory.tex \
paper-$(PAPER)/newtypes.tex \
paper-$(PAPER)/refcounting.tex \
paper-$(PAPER)/utilities.tex \
paper-$(PAPER)/veryhigh.tex \
commontex/reportingbugs.tex
DOCFILES= $(HOWTOSTYLES) \
commontex/boilerplate.tex \
texinputs/ltxmarkup.sty \
doc/doc.tex
EXTFILES= ext/ext.tex $(MANSTYLES) $(INDEXSTYLES) $(COMMONTEX) \
ext/extending.tex \
ext/newtypes.tex \
ext/building.tex \
ext/windows.tex \
ext/embedding.tex \
ext/noddy.c \
ext/noddy2.c \
ext/noddy3.c \
ext/noddy4.c \
ext/run-func.c \
commontex/typestruct.h \
commontex/reportingbugs.tex
TUTFILES= tut/tut.tex tut/glossary.tex $(MANSTYLES) $(COMMONTEX)
# LaTeX source files for the Python Reference Manual
REFFILES= $(MANSTYLES) $(INDEXSTYLES) $(COMMONTEX) \
ref/ref.tex \
ref/ref1.tex \
ref/ref2.tex \
ref/ref3.tex \
ref/ref4.tex \
ref/ref5.tex \
ref/ref6.tex \
ref/ref7.tex \
ref/ref8.tex
# LaTeX source files for the Python Library Reference
LIBFILES= $(MANSTYLES) $(INDEXSTYLES) $(COMMONTEX) \
commontex/reportingbugs.tex \
lib/lib.tex \
lib/asttable.tex \
lib/compiler.tex \
lib/distutils.tex \
lib/email.tex \
lib/emailencoders.tex \
lib/emailexc.tex \
lib/emailgenerator.tex \
lib/emailiter.tex \
lib/emailmessage.tex \
lib/emailparser.tex \
lib/emailutil.tex \
lib/libintro.tex \
lib/libobjs.tex \
lib/libstdtypes.tex \
lib/libexcs.tex \
lib/libconsts.tex \
lib/libfuncs.tex \
lib/libpython.tex \
lib/libsys.tex \
lib/libplatform.tex \
lib/libfpectl.tex \
lib/libgc.tex \
lib/libsets.tex \
lib/libweakref.tex \
lib/libinspect.tex \
lib/libpydoc.tex \
lib/libdifflib.tex \
lib/libdoctest.tex \
lib/libunittest.tex \
lib/libtest.tex \
lib/libtypes.tex \
lib/libtraceback.tex \
lib/libpickle.tex \
lib/libshelve.tex \
lib/libcopy.tex \
lib/libmarshal.tex \
lib/libwarnings.tex \
lib/libimp.tex \
lib/libzipimport.tex \
lib/librunpy.tex \
lib/libpkgutil.tex \
lib/libparser.tex \
lib/libbltin.tex \
lib/libmain.tex \
lib/libfuture.tex \
lib/libstrings.tex \
lib/libstring.tex \
lib/libtextwrap.tex \
lib/libcodecs.tex \
lib/libunicodedata.tex \
lib/libstringprep.tex \
lib/libstruct.tex \
lib/libmisc.tex \
lib/libmath.tex \
lib/libdecimal.tex \
lib/libarray.tex \
lib/liballos.tex \
lib/libos.tex \
lib/libdatetime.tex \
lib/tzinfo-examples.py \
lib/libtime.tex \
lib/libgetopt.tex \
lib/liboptparse.tex \
lib/caseless.py \
lib/required_1.py \
lib/required_2.py \
lib/libtempfile.tex \
lib/liberrno.tex \
lib/libctypes.tex \
lib/libsomeos.tex \
lib/libsignal.tex \
lib/libsocket.tex \
lib/libselect.tex \
lib/libthread.tex \
lib/libdummythread.tex \
lib/libunix.tex \
lib/libposix.tex \
lib/libposixpath.tex \
lib/libpwd.tex \
lib/libspwd.tex \
lib/libgrp.tex \
lib/libcrypt.tex \
lib/libdbm.tex \
lib/libgdbm.tex \
lib/libtermios.tex \
lib/libfcntl.tex \
lib/libposixfile.tex \
lib/libsyslog.tex \
lib/liblogging.tex \
lib/libpdb.tex \
lib/libprofile.tex \
lib/libhotshot.tex \
lib/libtimeit.tex \
lib/libtrace.tex \
lib/libcgi.tex \
lib/libcgitb.tex \
lib/liburllib.tex \
lib/liburllib2.tex \
lib/libhttplib.tex \
lib/libftplib.tex \
lib/libnntplib.tex \
lib/liburlparse.tex \
lib/libhtmlparser.tex \
lib/libhtmllib.tex \
lib/libsgmllib.tex \
lib/librfc822.tex \
lib/libmimetools.tex \
lib/libmimewriter.tex \
lib/libbinascii.tex \
lib/libmm.tex \
lib/libaudioop.tex \
lib/libimageop.tex \
lib/libaifc.tex \
lib/libjpeg.tex \
lib/libossaudiodev.tex \
lib/libcrypto.tex \
lib/libhashlib.tex \
lib/libmd5.tex \
lib/libsha.tex \
lib/libhmac.tex \
lib/libsgi.tex \
lib/libal.tex \
lib/libcd.tex \
lib/libfl.tex \
lib/libfm.tex \
lib/libgl.tex \
lib/libimgfile.tex \
lib/libsun.tex \
lib/libxdrlib.tex \
lib/libimghdr.tex \
lib/librestricted.tex \
lib/librexec.tex \
lib/libbastion.tex \
lib/libformatter.tex \
lib/liboperator.tex \
lib/libresource.tex \
lib/libstat.tex \
lib/libstringio.tex \
lib/libtoken.tex \
lib/libkeyword.tex \
lib/libundoc.tex \
lib/libmailcap.tex \
lib/libglob.tex \
lib/libuser.tex \
lib/libanydbm.tex \
lib/libbsddb.tex \
lib/libdumbdbm.tex \
lib/libdbhash.tex \
lib/librandom.tex \
lib/libsite.tex \
lib/libwhichdb.tex \
lib/libbase64.tex \
lib/libfnmatch.tex \
lib/libquopri.tex \
lib/libzlib.tex \
lib/libsocksvr.tex \
lib/libmailbox.tex \
lib/libcommands.tex \
lib/libcmath.tex \
lib/libgzip.tex \
lib/libbz2.tex \
lib/libzipfile.tex \
lib/libpprint.tex \
lib/libcode.tex \
lib/libmimify.tex \
lib/libre.tex \
lib/libuserdict.tex \
lib/libdis.tex \
lib/libxmlrpclib.tex \
lib/libsimplexmlrpc.tex \
lib/libdocxmlrpc.tex \
lib/libpyexpat.tex \
lib/libfunctools.tex \
lib/xmldom.tex \
lib/xmldomminidom.tex \
lib/xmldompulldom.tex \
lib/xmlsax.tex \
lib/xmlsaxhandler.tex \
lib/xmlsaxutils.tex \
lib/xmlsaxreader.tex \
lib/libetree.tex \
lib/libqueue.tex \
lib/liblocale.tex \
lib/libgettext.tex \
lib/libbasehttp.tex \
lib/libcookie.tex \
lib/libcookielib.tex \
lib/libcopyreg.tex \
lib/libsymbol.tex \
lib/libbinhex.tex \
lib/libuu.tex \
lib/libsunaudio.tex \
lib/libfileinput.tex \
lib/libimaplib.tex \
lib/libpoplib.tex \
lib/libcalendar.tex \
lib/libpopen2.tex \
lib/libbisect.tex \
lib/libcollections.tex \
lib/libheapq.tex \
lib/libmimetypes.tex \
lib/libsmtplib.tex \
lib/libsmtpd.tex \
lib/libcmd.tex \
lib/libmultifile.tex \
lib/libthreading.tex \
lib/libdummythreading.tex \
lib/libwebbrowser.tex \
lib/internet.tex \
lib/netdata.tex \
lib/markup.tex \
lib/language.tex \
lib/libpycompile.tex \
lib/libcompileall.tex \
lib/libshlex.tex \
lib/libnetrc.tex \
lib/librobotparser.tex \
lib/libgetpass.tex \
lib/libshutil.tex \
lib/librepr.tex \
lib/libmsilib.tex \
lib/libmsvcrt.tex \
lib/libwinreg.tex \
lib/libwinsound.tex \
lib/windows.tex \
lib/libpyclbr.tex \
lib/libtokenize.tex \
lib/libtabnanny.tex \
lib/libmhlib.tex \
lib/libtelnetlib.tex \
lib/libcolorsys.tex \
lib/libfpformat.tex \
lib/libcgihttp.tex \
lib/libsimplehttp.tex \
lib/liblinecache.tex \
lib/libnew.tex \
lib/libdircache.tex \
lib/libfilecmp.tex \
lib/libsunau.tex \
lib/libwave.tex \
lib/libchunk.tex \
lib/libcodeop.tex \
lib/libcurses.tex \
lib/libcursespanel.tex \
lib/libascii.tex \
lib/libdl.tex \
lib/libmutex.tex \
lib/libnis.tex \
lib/libpipes.tex \
lib/libpty.tex \
lib/libreadline.tex \
lib/librlcompleter.tex \
lib/libsched.tex \
lib/libstatvfs.tex \
lib/libtty.tex \
lib/libasyncore.tex \
lib/libasynchat.tex \
lib/libatexit.tex \
lib/libmmap.tex \
lib/tkinter.tex \
lib/libturtle.tex \
lib/libtarfile.tex \
lib/libcsv.tex \
lib/libcfgparser.tex \
lib/libsqlite3.tex
# LaTeX source files for Macintosh Library Modules.
MACFILES= $(HOWTOSTYLES) $(INDEXSTYLES) $(COMMONTEX) \
mac/mac.tex \
mac/using.tex \
mac/scripting.tex \
mac/toolbox.tex \
mac/undoc.tex \
mac/libcolorpicker.tex \
mac/libmac.tex \
mac/libgensuitemodule.tex \
mac/libaetools.tex \
mac/libaepack.tex \
mac/libaetypes.tex \
mac/libmacos.tex \
mac/libmacostools.tex \
mac/libmacui.tex \
mac/libmacic.tex \
mac/libframework.tex \
mac/libautogil.tex \
mac/libminiae.tex \
mac/libscrap.tex
INSTFILES = $(HOWTOSTYLES) inst/inst.tex
DISTFILES = $(HOWTOSTYLES) \
dist/dist.tex \
dist/sysconfig.tex

View File

@ -1,246 +0,0 @@
Python standard documentation -- in LaTeX
-----------------------------------------
This directory contains the LaTeX sources to the Python documentation
and tools required to support the formatting process. The documents
now require LaTeX2e; LaTeX 2.09 compatibility has been dropped.
If you don't have LaTeX, or if you'd rather not format the
documentation yourself, you can ftp a tar file containing HTML, PDF,
or PostScript versions of all documents. Additional formats may be
available. These should be in the same place where you fetched the
main Python distribution (try <http://www.python.org/> or
<ftp://ftp.python.org/pub/python/>).
The following are the LaTeX source files:
api/*.tex Python/C API Reference Manual
doc/*.tex Documenting Python
ext/*.tex Extending and Embedding the Python Interpreter
lib/*.tex Python Library Reference
mac/*.tex Macintosh Library Modules
ref/*.tex Python Reference Manual
tut/*.tex Python Tutorial
inst/*.tex Installing Python Modules
dist/*.tex Distributing Python Modules
Most use the "manual" document class and "python" package, derived from
the old "myformat.sty" style file. The Macintosh Library Modules
document uses the "howto" document class instead. These contains many
macro definitions useful in documenting Python, and set some style
parameters.
There's a Makefile to call LaTeX and the other utilities in the right
order and the right number of times. By default, it will build the
HTML version of the documentation, but DVI, PDF, and PostScript can
also be made. To view the generated HTML, point your favorite browser
at the top-level index (html/index.html) after running "make".
The Makefile can also produce DVI files for each document made; to
preview them, use xdvi. PostScript is produced by the same Makefile
target that produces the DVI files. This uses the dvips tool.
Printing depends on local conventions; at our site, we use lpr. For
example:
make paper-letter/lib.ps # create lib.dvi and lib.ps
xdvi paper-letter/lib.dvi # preview lib.dvi
lpr paper-letter/lib.ps # print on default printer
What if I find a bug?
---------------------
First, check that the bug is present in the development version of the
documentation at <http://www.python.org/dev/doc/devel/>; we may
have already fixed it.
If we haven't, tell us about it. We'd like the documentation to be
complete and accurate, but have limited time. If you discover any
inconsistencies between the documentation and implementation, or just
have suggestions as to how to improve the documentation, let is know!
Specific bugs and patches should be reported using our bug & patch
databases at:
http://sourceforge.net/projects/python
Other suggestions or questions should be sent to the Python
Documentation Team:
docs@python.org
Thanks!
What tools do I need?
---------------------
You need to install Python; some of the scripts used to produce the
documentation are written in Python. You don't need this
documentation to install Python; instructions are included in the
README file in the Python distribution.
The simplest way to get the rest of the tools in the configuration we
used is to install the teTeX TeX distribution, versions 0.9 or newer.
More information is available on teTeX at <http://www.tug.org/tetex/>.
This is a Unix-only TeX distribution at this time. This documentation
release was tested with the 1.0.7 release, but there have been no
substantial changes since late in the 0.9 series, which we used
extensively for previous versions without any difficulty.
If you don't want to get teTeX, here is what you'll need:
To create DVI, PDF, or PostScript files:
- LaTeX2e, 1995/12/01 or newer. Older versions are likely to
choke.
- makeindex. This is used to produce the indexes for the
library reference and Python/C API reference.
To create PDF files:
- pdflatex. We used the one in the teTeX distribution (pdfTeX
version 3.14159-13d (Web2C 7.3.1) at the time of this
writing). Versions even a couple of patchlevels earlier are
highly likely to fail due to syntax changes for some of the
pdftex primitives.
To create PostScript files:
- dvips. Most TeX installations include this. If you don't
have one, check CTAN (<ftp://ctan.tug.org/tex-archive/>).
To create info files:
Note that info support is currently being revised using new
conversion tools by Michael Ernst <mernst@cs.washington.edu>.
- makeinfo. This is available from any GNU mirror.
- emacs or xemacs. Emacs is available from the same place as
makeinfo, and xemacs is available from ftp.xemacs.org.
- Perl. Find the software at
<http://language.perl.com/info/software.html>.
- HTML::Element. If you don't have this installed, you can get
this from CPAN. Use the command:
perl -e 'use CPAN; CPAN::install("HTML::Element");'
You may need to be root to do this.
To create HTML files:
- Perl 5.6.0 or newer. Find the software at
<http://language.perl.com/info/software.html>.
- LaTeX2HTML 99.2b8 or newer. Older versions are not
supported; each version changes enough that supporting
multiple versions is not likely to work. Many older
versions don't work with Perl 5.6 as well. This also screws
up code fragments. ;-( Releases are available at:
<http://www.latex2html.org/>.
I got a make error: "make: don't know how to make commontex/patchlevel.tex."
----------------------------------------------------------------------------
Your version of make doesn't support the 'shell' function. You will need to
use a version which does, e.g. GNU make.
LaTeX (or pdfLaTeX) ran out of memory; how can I fix it?
--------------------------------------------------------
This is known to be a problem at least on Mac OS X, but it has been
observed on other systems in the past.
On some systems, the default sizes of some of the memory pools
allocated by TeX needs to be changed; this is a configuration setting
for installations based on web2c (most if not all installations).
This is usually set in a file named texmf/web2c/texmf.cnf (where the
top-level texmf/ directory is part of the TeX installation). If you
get a "buffer overflow" warning from LaTeX, open that configuration
file and look for the "main_memory.pdflatex" setting. If there is not
one, you can add a line with the setting. The value 1500000 seems to
be sufficient for formatting the Python documetantion.
What if Times fonts are not available?
--------------------------------------
As distributed, the LaTeX documents use PostScript Times fonts. This
is done since they are much better looking and produce smaller
PostScript files. If, however, your TeX installation does not support
them, they may be easily disabled. Edit the file
texinputs/pypaper.sty and comment out the line that starts
"\RequirePackage{times}" by inserting a "%" character at the beginning
of the line. If you're formatting the docs for A4 paper instead of
US-Letter paper, change paper-a4/pypaper.sty instead. An alternative
is to install the right fonts and LaTeX style file.
What if I want to use A4 paper?
-------------------------------
Instead of building the PostScript by giving the command "make ps",
give the command "make PAPER=a4 ps"; the output will be produced in
the paper-a4/ subdirectory. (You can use "make PAPER=a4 pdf" if you'd
rather have PDF output.)
Making HTML files
-----------------
The LaTeX documents can be converted to HTML using Nikos Drakos'
LaTeX2HTML converter. See the Makefile; after some twiddling, "make"
should do the trick.
What else is in here?
---------------------
There is a new LaTeX document class called "howto". This is used for
the new series of Python HOWTO documents which is being coordinated by
Andrew Kuchling <akuchlin@mems-exchange.org>. The file
templates/howto.tex is a commented example which may be used as a
template. A Python script to "do the right thing" to format a howto
document is included as tools/mkhowto. These documents can be
formatted as HTML, PDF, PostScript, or ASCII files. Use "mkhowto
--help" for information on using the formatting tool.
For authors of module documentation, there is a file
templates/module.tex which may be used as a template for a module
section. This may be used in conjunction with either the howto or
manual document class. Create the documentation for a new module by
copying the template to lib<mymodule>.tex and editing according to the
instructions in the comments.
Documentation on the authoring Python documentation, including
information about both style and markup, is available in the
"Documenting Python" manual.
Copyright notice
================
The Python source is copyrighted, but you can freely use and copy it
as long as you don't change or remove the copyright notice:
----------------------------------------------------------------------
Copyright (c) 2000-2007 Python Software Foundation.
All rights reserved.
Copyright (c) 2000 BeOpen.com.
All rights reserved.
Copyright (c) 1995-2000 Corporation for National Research Initiatives.
All rights reserved.
Copyright (c) 1991-1995 Stichting Mathematisch Centrum.
All rights reserved.
See the file "commontex/license.tex" for information on usage and
redistribution of this file, and for a DISCLAIMER OF ALL WARRANTIES.
----------------------------------------------------------------------

View File

@ -1,74 +0,0 @@
PYTHON DOCUMENTATION TO-DO LIST -*- indented-text -*-
===============================
General
-------
* Figure out HTMLHelp generation for the Windows world.
Python/C API
------------
* The "Very High Level Interface" in the API document has been
requested; I guess it wouldn't hurt to fill in a bit there. Request
by Albert Hofkamp <a.hofkamp@wtb.tue.nl>. (Partly done.)
* Describe implementing types in C, including use of the 'self'
parameter to the method implementation function. (Missing material
mentioned in the Extending & Embedding manual, section 1.1; problem
reported by Clay Spence <cspence@sarnoff.com>.) Heavily impacts one
chapter of the Python/C API manual.
* Missing PyArg_ParseTuple(), PyArg_ParseTupleAndKeywords(),
Py_BuildValue(). Information requested by Greg Kochanski
<gpk@bell-labs.com>. PyEval_EvalCode() has also been requested.
Extending & Embedding
---------------------
* More information is needed about building dynamically linked
extensions in C++. Specifically, the extensions must be linked
against the C++ libraries (and possibly runtime). Also noted by
Albert Hofkamp <a.hofkamp@wtb.tue.nl>.
Reference Manual
----------------
* Document the Extended Call Syntax in the language reference.
[Jeremy Hylton]
* Document new comparison support for recursive objects (lang. ref.?
library ref.? (cmp() function). [Jeremy Hylton]
Library Reference
-----------------
* Update the pickle documentation to describe all of the current
behavior; only a subset is described. __reduce__, etc. Partial
update submitted by Jim Kerr <jbkerr@sr.hp.com>.
* Update the httplib documentation to match Greg Stein's HTTP/1.1
support and new classes. (Greg, this is yours!)
Tutorial
--------
* Update tutorial to use string methods and talk about backward
compatibility of same.
NOT WORTH THE TROUBLE
---------------------
* In the indexes, some subitem entries are separated from the item
entries by column- or page-breaks. Reported by Lorenzo M. Catucci
<lorenzo@argon.roma2.infn.it>. This one will be hard; probably not
really worth the pain. (Only an issue at all when a header-letter
and the first index entry get separated -- can change as soon as we
change the index entries in the text.) Also only a problem in the
print version.
* Fix problem with howto documents getting the last module synopsis
twice (in \localmoduletable) so we can get rid of the ugly 'uniq'
hack in tools/mkhowto. (Probably not worth the trouble of fixing.)

File diff suppressed because it is too large Load Diff

View File

@ -1,60 +0,0 @@
\documentclass{manual}
\title{Python/C API Reference Manual}
\input{boilerplate}
\makeindex % tell \index to actually write the .idx file
\begin{document}
\maketitle
\ifhtml
\chapter*{Front Matter\label{front}}
\fi
\input{copyright}
\begin{abstract}
\noindent
This manual documents the API used by C and \Cpp{} programmers who
want to write extension modules or embed Python. It is a companion to
\citetitle[../ext/ext.html]{Extending and Embedding the Python
Interpreter}, which describes the general principles of extension
writing but does not document the API functions in detail.
\warning{The current version of this document is incomplete. I hope
that it is nevertheless useful. I will continue to work on it, and
release new versions from time to time, independent from Python source
code releases.}
\end{abstract}
\tableofcontents
\input{intro}
\input{veryhigh}
\input{refcounting}
\input{exceptions}
\input{utilities}
\input{abstract}
\input{concrete}
\input{init}
\input{memory}
\input{newtypes}
\appendix
\chapter{Reporting Bugs}
\input{reportingbugs}
\chapter{History and License}
\input{license}
\input{api.ind} % Index -- must be last
\end{document}

File diff suppressed because it is too large Load Diff

View File

@ -1,442 +0,0 @@
\chapter{Exception Handling \label{exceptionHandling}}
The functions described in this chapter will let you handle and raise Python
exceptions. It is important to understand some of the basics of
Python exception handling. It works somewhat like the
\UNIX{} \cdata{errno} variable: there is a global indicator (per
thread) of the last error that occurred. Most functions don't clear
this on success, but will set it to indicate the cause of the error on
failure. Most functions also return an error indicator, usually
\NULL{} if they are supposed to return a pointer, or \code{-1} if they
return an integer (exception: the \cfunction{PyArg_*()} functions
return \code{1} for success and \code{0} for failure).
When a function must fail because some function it called failed, it
generally doesn't set the error indicator; the function it called
already set it. It is responsible for either handling the error and
clearing the exception or returning after cleaning up any resources it
holds (such as object references or memory allocations); it should
\emph{not} continue normally if it is not prepared to handle the
error. If returning due to an error, it is important to indicate to
the caller that an error has been set. If the error is not handled or
carefully propagated, additional calls into the Python/C API may not
behave as intended and may fail in mysterious ways.
The error indicator consists of three Python objects corresponding to
\withsubitem{(in module sys)}{
\ttindex{exc_type}\ttindex{exc_value}\ttindex{exc_traceback}}
the Python variables \code{sys.exc_type}, \code{sys.exc_value} and
\code{sys.exc_traceback}. API functions exist to interact with the
error indicator in various ways. There is a separate error indicator
for each thread.
% XXX Order of these should be more thoughtful.
% Either alphabetical or some kind of structure.
\begin{cfuncdesc}{void}{PyErr_Print}{}
Print a standard traceback to \code{sys.stderr} and clear the error
indicator. Call this function only when the error indicator is
set. (Otherwise it will cause a fatal error!)
\end{cfuncdesc}
\begin{cfuncdesc}{PyObject*}{PyErr_Occurred}{}
Test whether the error indicator is set. If set, return the
exception \emph{type} (the first argument to the last call to one of
the \cfunction{PyErr_Set*()} functions or to
\cfunction{PyErr_Restore()}). If not set, return \NULL. You do
not own a reference to the return value, so you do not need to
\cfunction{Py_DECREF()} it. \note{Do not compare the return value
to a specific exception; use \cfunction{PyErr_ExceptionMatches()}
instead, shown below. (The comparison could easily fail since the
exception may be an instance instead of a class, in the case of a
class exception, or it may the a subclass of the expected
exception.)}
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PyErr_ExceptionMatches}{PyObject *exc}
Equivalent to \samp{PyErr_GivenExceptionMatches(PyErr_Occurred(),
\var{exc})}. This should only be called when an exception is
actually set; a memory access violation will occur if no exception
has been raised.
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PyErr_GivenExceptionMatches}{PyObject *given, PyObject *exc}
Return true if the \var{given} exception matches the exception in
\var{exc}. If \var{exc} is a class object, this also returns true
when \var{given} is an instance of a subclass. If \var{exc} is a
tuple, all exceptions in the tuple (and recursively in subtuples)
are searched for a match. If \var{given} is \NULL, a memory access
violation will occur.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PyErr_NormalizeException}{PyObject**exc, PyObject**val, PyObject**tb}
Under certain circumstances, the values returned by
\cfunction{PyErr_Fetch()} below can be ``unnormalized'', meaning
that \code{*\var{exc}} is a class object but \code{*\var{val}} is
not an instance of the same class. This function can be used to
instantiate the class in that case. If the values are already
normalized, nothing happens. The delayed normalization is
implemented to improve performance.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PyErr_Clear}{}
Clear the error indicator. If the error indicator is not set, there
is no effect.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PyErr_Fetch}{PyObject **ptype, PyObject **pvalue,
PyObject **ptraceback}
Retrieve the error indicator into three variables whose addresses
are passed. If the error indicator is not set, set all three
variables to \NULL. If it is set, it will be cleared and you own a
reference to each object retrieved. The value and traceback object
may be \NULL{} even when the type object is not. \note{This
function is normally only used by code that needs to handle
exceptions or by code that needs to save and restore the error
indicator temporarily.}
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PyErr_Restore}{PyObject *type, PyObject *value,
PyObject *traceback}
Set the error indicator from the three objects. If the error
indicator is already set, it is cleared first. If the objects are
\NULL, the error indicator is cleared. Do not pass a \NULL{} type
and non-\NULL{} value or traceback. The exception type should be a
class. Do not pass an invalid exception type or value.
(Violating these rules will cause subtle problems later.) This call
takes away a reference to each object: you must own a reference to
each object before the call and after the call you no longer own
these references. (If you don't understand this, don't use this
function. I warned you.) \note{This function is normally only used
by code that needs to save and restore the error indicator
temporarily; use \cfunction{PyErr_Fetch()} to save the current
exception state.}
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PyErr_SetString}{PyObject *type, const char *message}
This is the most common way to set the error indicator. The first
argument specifies the exception type; it is normally one of the
standard exceptions, e.g. \cdata{PyExc_RuntimeError}. You need not
increment its reference count. The second argument is an error
message; it is converted to a string object.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PyErr_SetObject}{PyObject *type, PyObject *value}
This function is similar to \cfunction{PyErr_SetString()} but lets
you specify an arbitrary Python object for the ``value'' of the
exception.
\end{cfuncdesc}
\begin{cfuncdesc}{PyObject*}{PyErr_Format}{PyObject *exception,
const char *format, \moreargs}
This function sets the error indicator and returns \NULL.
\var{exception} should be a Python exception (class, not
an instance). \var{format} should be a string, containing format
codes, similar to \cfunction{printf()}. The \code{width.precision}
before a format code is parsed, but the width part is ignored.
% This should be exactly the same as the table in PyString_FromFormat.
% One should just refer to the other.
% The descriptions for %zd and %zu are wrong, but the truth is complicated
% because not all compilers support the %z width modifier -- we fake it
% when necessary via interpolating PY_FORMAT_SIZE_T.
% %u, %lu, %zu should have "new in Python 2.5" blurbs.
\begin{tableiii}{l|l|l}{member}{Format Characters}{Type}{Comment}
\lineiii{\%\%}{\emph{n/a}}{The literal \% character.}
\lineiii{\%c}{int}{A single character, represented as an C int.}
\lineiii{\%d}{int}{Exactly equivalent to \code{printf("\%d")}.}
\lineiii{\%u}{unsigned int}{Exactly equivalent to \code{printf("\%u")}.}
\lineiii{\%ld}{long}{Exactly equivalent to \code{printf("\%ld")}.}
\lineiii{\%lu}{unsigned long}{Exactly equivalent to \code{printf("\%lu")}.}
\lineiii{\%zd}{Py_ssize_t}{Exactly equivalent to \code{printf("\%zd")}.}
\lineiii{\%zu}{size_t}{Exactly equivalent to \code{printf("\%zu")}.}
\lineiii{\%i}{int}{Exactly equivalent to \code{printf("\%i")}.}
\lineiii{\%x}{int}{Exactly equivalent to \code{printf("\%x")}.}
\lineiii{\%s}{char*}{A null-terminated C character array.}
\lineiii{\%p}{void*}{The hex representation of a C pointer.
Mostly equivalent to \code{printf("\%p")} except that it is
guaranteed to start with the literal \code{0x} regardless of
what the platform's \code{printf} yields.}
\end{tableiii}
An unrecognized format character causes all the rest of the format
string to be copied as-is to the result string, and any extra
arguments discarded.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PyErr_SetNone}{PyObject *type}
This is a shorthand for \samp{PyErr_SetObject(\var{type},
Py_None)}.
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PyErr_BadArgument}{}
This is a shorthand for \samp{PyErr_SetString(PyExc_TypeError,
\var{message})}, where \var{message} indicates that a built-in
operation was invoked with an illegal argument. It is mostly for
internal use.
\end{cfuncdesc}
\begin{cfuncdesc}{PyObject*}{PyErr_NoMemory}{}
This is a shorthand for \samp{PyErr_SetNone(PyExc_MemoryError)}; it
returns \NULL{} so an object allocation function can write
\samp{return PyErr_NoMemory();} when it runs out of memory.
\end{cfuncdesc}
\begin{cfuncdesc}{PyObject*}{PyErr_SetFromErrno}{PyObject *type}
This is a convenience function to raise an exception when a C
library function has returned an error and set the C variable
\cdata{errno}. It constructs a tuple object whose first item is the
integer \cdata{errno} value and whose second item is the
corresponding error message (gotten from
\cfunction{strerror()}\ttindex{strerror()}), and then calls
\samp{PyErr_SetObject(\var{type}, \var{object})}. On \UNIX, when
the \cdata{errno} value is \constant{EINTR}, indicating an
interrupted system call, this calls
\cfunction{PyErr_CheckSignals()}, and if that set the error
indicator, leaves it set to that. The function always returns
\NULL, so a wrapper function around a system call can write
\samp{return PyErr_SetFromErrno(\var{type});} when the system call
returns an error.
\end{cfuncdesc}
\begin{cfuncdesc}{PyObject*}{PyErr_SetFromErrnoWithFilename}{PyObject *type,
const char *filename}
Similar to \cfunction{PyErr_SetFromErrno()}, with the additional
behavior that if \var{filename} is not \NULL, it is passed to the
constructor of \var{type} as a third parameter. In the case of
exceptions such as \exception{IOError} and \exception{OSError}, this
is used to define the \member{filename} attribute of the exception
instance.
\end{cfuncdesc}
\begin{cfuncdesc}{PyObject*}{PyErr_SetFromWindowsErr}{int ierr}
This is a convenience function to raise \exception{WindowsError}.
If called with \var{ierr} of \cdata{0}, the error code returned by a
call to \cfunction{GetLastError()} is used instead. It calls the
Win32 function \cfunction{FormatMessage()} to retrieve the Windows
description of error code given by \var{ierr} or
\cfunction{GetLastError()}, then it constructs a tuple object whose
first item is the \var{ierr} value and whose second item is the
corresponding error message (gotten from
\cfunction{FormatMessage()}), and then calls
\samp{PyErr_SetObject(\var{PyExc_WindowsError}, \var{object})}.
This function always returns \NULL.
Availability: Windows.
\end{cfuncdesc}
\begin{cfuncdesc}{PyObject*}{PyErr_SetExcFromWindowsErr}{PyObject *type,
int ierr}
Similar to \cfunction{PyErr_SetFromWindowsErr()}, with an additional
parameter specifying the exception type to be raised.
Availability: Windows.
\versionadded{2.3}
\end{cfuncdesc}
\begin{cfuncdesc}{PyObject*}{PyErr_SetFromWindowsErrWithFilename}{int ierr,
const char *filename}
Similar to \cfunction{PyErr_SetFromWindowsErr()}, with the
additional behavior that if \var{filename} is not \NULL, it is
passed to the constructor of \exception{WindowsError} as a third
parameter.
Availability: Windows.
\end{cfuncdesc}
\begin{cfuncdesc}{PyObject*}{PyErr_SetExcFromWindowsErrWithFilename}
{PyObject *type, int ierr, char *filename}
Similar to \cfunction{PyErr_SetFromWindowsErrWithFilename()}, with
an additional parameter specifying the exception type to be raised.
Availability: Windows.
\versionadded{2.3}
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PyErr_BadInternalCall}{}
This is a shorthand for \samp{PyErr_SetString(PyExc_TypeError,
\var{message})}, where \var{message} indicates that an internal
operation (e.g. a Python/C API function) was invoked with an illegal
argument. It is mostly for internal use.
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PyErr_WarnEx}{PyObject *category, char *message, int stacklevel}
Issue a warning message. The \var{category} argument is a warning
category (see below) or \NULL; the \var{message} argument is a
message string. \var{stacklevel} is a positive number giving a
number of stack frames; the warning will be issued from the
currently executing line of code in that stack frame. A \var{stacklevel}
of 1 is the function calling \cfunction{PyErr_WarnEx()}, 2 is
the function above that, and so forth.
This function normally prints a warning message to \var{sys.stderr};
however, it is also possible that the user has specified that
warnings are to be turned into errors, and in that case this will
raise an exception. It is also possible that the function raises an
exception because of a problem with the warning machinery (the
implementation imports the \module{warnings} module to do the heavy
lifting). The return value is \code{0} if no exception is raised,
or \code{-1} if an exception is raised. (It is not possible to
determine whether a warning message is actually printed, nor what
the reason is for the exception; this is intentional.) If an
exception is raised, the caller should do its normal exception
handling (for example, \cfunction{Py_DECREF()} owned references and
return an error value).
Warning categories must be subclasses of \cdata{Warning}; the
default warning category is \cdata{RuntimeWarning}. The standard
Python warning categories are available as global variables whose
names are \samp{PyExc_} followed by the Python exception name.
These have the type \ctype{PyObject*}; they are all class objects.
Their names are \cdata{PyExc_Warning}, \cdata{PyExc_UserWarning},
\cdata{PyExc_UnicodeWarning}, \cdata{PyExc_DeprecationWarning},
\cdata{PyExc_SyntaxWarning}, \cdata{PyExc_RuntimeWarning}, and
\cdata{PyExc_FutureWarning}. \cdata{PyExc_Warning} is a subclass of
\cdata{PyExc_Exception}; the other warning categories are subclasses
of \cdata{PyExc_Warning}.
For information about warning control, see the documentation for the
\module{warnings} module and the \programopt{-W} option in the
command line documentation. There is no C API for warning control.
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PyErr_Warn}{PyObject *category, char *message}
Issue a warning message. The \var{category} argument is a warning
category (see below) or \NULL; the \var{message} argument is a
message string. The warning will appear to be issued from the function
calling \cfunction{PyErr_Warn()}, equivalent to calling
\cfunction{PyErr_WarnEx()} with a \var{stacklevel} of 1.
Deprecated; use \cfunction{PyErr_WarnEx()} instead.
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PyErr_WarnExplicit}{PyObject *category,
const char *message, const char *filename, int lineno,
const char *module, PyObject *registry}
Issue a warning message with explicit control over all warning
attributes. This is a straightforward wrapper around the Python
function \function{warnings.warn_explicit()}, see there for more
information. The \var{module} and \var{registry} arguments may be
set to \NULL{} to get the default effect described there.
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PyErr_CheckSignals}{}
This function interacts with Python's signal handling. It checks
whether a signal has been sent to the processes and if so, invokes
the corresponding signal handler. If the
\module{signal}\refbimodindex{signal} module is supported, this can
invoke a signal handler written in Python. In all cases, the
default effect for \constant{SIGINT}\ttindex{SIGINT} is to raise the
\withsubitem{(built-in exception)}{\ttindex{KeyboardInterrupt}}
\exception{KeyboardInterrupt} exception. If an exception is raised
the error indicator is set and the function returns \code{-1};
otherwise the function returns \code{0}. The error indicator may or
may not be cleared if it was previously set.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PyErr_SetInterrupt}{}
This function simulates the effect of a
\constant{SIGINT}\ttindex{SIGINT} signal arriving --- the next time
\cfunction{PyErr_CheckSignals()} is called,
\withsubitem{(built-in exception)}{\ttindex{KeyboardInterrupt}}
\exception{KeyboardInterrupt} will be raised. It may be called
without holding the interpreter lock.
% XXX This was described as obsolete, but is used in
% thread.interrupt_main() (used from IDLE), so it's still needed.
\end{cfuncdesc}
\begin{cfuncdesc}{PyObject*}{PyErr_NewException}{char *name,
PyObject *base,
PyObject *dict}
This utility function creates and returns a new exception object.
The \var{name} argument must be the name of the new exception, a C
string of the form \code{module.class}. The \var{base} and
\var{dict} arguments are normally \NULL. This creates a class
object derived from \exception{Exception} (accessible in C as
\cdata{PyExc_Exception}).
The \member{__module__} attribute of the new class is set to the
first part (up to the last dot) of the \var{name} argument, and the
class name is set to the last part (after the last dot). The
\var{base} argument can be used to specify alternate base classes;
it can either be only one class or a tuple of classes.
The \var{dict} argument can be used to specify a dictionary of class
variables and methods.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PyErr_WriteUnraisable}{PyObject *obj}
This utility function prints a warning message to \code{sys.stderr}
when an exception has been set but it is impossible for the
interpreter to actually raise the exception. It is used, for
example, when an exception occurs in an \method{__del__()} method.
The function is called with a single argument \var{obj} that
identifies the context in which the unraisable exception occurred.
The repr of \var{obj} will be printed in the warning message.
\end{cfuncdesc}
\section{Standard Exceptions \label{standardExceptions}}
All standard Python exceptions are available as global variables whose
names are \samp{PyExc_} followed by the Python exception name. These
have the type \ctype{PyObject*}; they are all class objects. For
completeness, here are all the variables:
\begin{tableiii}{l|l|c}{cdata}{C Name}{Python Name}{Notes}
\lineiii{PyExc_BaseException\ttindex{PyExc_BaseException}}{\exception{BaseException}}{(1), (4)}
\lineiii{PyExc_Exception\ttindex{PyExc_Exception}}{\exception{Exception}}{(1)}
\lineiii{PyExc_StandardError\ttindex{PyExc_StandardError}}{\exception{StandardError}}{(1)}
\lineiii{PyExc_ArithmeticError\ttindex{PyExc_ArithmeticError}}{\exception{ArithmeticError}}{(1)}
\lineiii{PyExc_LookupError\ttindex{PyExc_LookupError}}{\exception{LookupError}}{(1)}
\lineiii{PyExc_AssertionError\ttindex{PyExc_AssertionError}}{\exception{AssertionError}}{}
\lineiii{PyExc_AttributeError\ttindex{PyExc_AttributeError}}{\exception{AttributeError}}{}
\lineiii{PyExc_EOFError\ttindex{PyExc_EOFError}}{\exception{EOFError}}{}
\lineiii{PyExc_EnvironmentError\ttindex{PyExc_EnvironmentError}}{\exception{EnvironmentError}}{(1)}
\lineiii{PyExc_FloatingPointError\ttindex{PyExc_FloatingPointError}}{\exception{FloatingPointError}}{}
\lineiii{PyExc_IOError\ttindex{PyExc_IOError}}{\exception{IOError}}{}
\lineiii{PyExc_ImportError\ttindex{PyExc_ImportError}}{\exception{ImportError}}{}
\lineiii{PyExc_IndexError\ttindex{PyExc_IndexError}}{\exception{IndexError}}{}
\lineiii{PyExc_KeyError\ttindex{PyExc_KeyError}}{\exception{KeyError}}{}
\lineiii{PyExc_KeyboardInterrupt\ttindex{PyExc_KeyboardInterrupt}}{\exception{KeyboardInterrupt}}{}
\lineiii{PyExc_MemoryError\ttindex{PyExc_MemoryError}}{\exception{MemoryError}}{}
\lineiii{PyExc_NameError\ttindex{PyExc_NameError}}{\exception{NameError}}{}
\lineiii{PyExc_NotImplementedError\ttindex{PyExc_NotImplementedError}}{\exception{NotImplementedError}}{}
\lineiii{PyExc_OSError\ttindex{PyExc_OSError}}{\exception{OSError}}{}
\lineiii{PyExc_OverflowError\ttindex{PyExc_OverflowError}}{\exception{OverflowError}}{}
\lineiii{PyExc_ReferenceError\ttindex{PyExc_ReferenceError}}{\exception{ReferenceError}}{(2)}
\lineiii{PyExc_RuntimeError\ttindex{PyExc_RuntimeError}}{\exception{RuntimeError}}{}
\lineiii{PyExc_SyntaxError\ttindex{PyExc_SyntaxError}}{\exception{SyntaxError}}{}
\lineiii{PyExc_SystemError\ttindex{PyExc_SystemError}}{\exception{SystemError}}{}
\lineiii{PyExc_SystemExit\ttindex{PyExc_SystemExit}}{\exception{SystemExit}}{}
\lineiii{PyExc_TypeError\ttindex{PyExc_TypeError}}{\exception{TypeError}}{}
\lineiii{PyExc_ValueError\ttindex{PyExc_ValueError}}{\exception{ValueError}}{}
\lineiii{PyExc_WindowsError\ttindex{PyExc_WindowsError}}{\exception{WindowsError}}{(3)}
\lineiii{PyExc_ZeroDivisionError\ttindex{PyExc_ZeroDivisionError}}{\exception{ZeroDivisionError}}{}
\end{tableiii}
\noindent
Notes:
\begin{description}
\item[(1)]
This is a base class for other standard exceptions.
\item[(2)]
This is the same as \exception{weakref.ReferenceError}.
\item[(3)]
Only defined on Windows; protect code that uses this by testing that
the preprocessor macro \code{MS_WINDOWS} is defined.
\item[(4)]
\versionadded{2.5}
\end{description}
\section{Deprecation of String Exceptions}
All exceptions built into Python or provided in the standard library
are derived from \exception{BaseException}.
\withsubitem{(built-in exception)}{\ttindex{BaseException}}
String exceptions are still supported in the interpreter to allow
existing code to run unmodified, but this will also change in a future
release.

View File

@ -1,884 +0,0 @@
\chapter{Initialization, Finalization, and Threads
\label{initialization}}
\begin{cfuncdesc}{void}{Py_Initialize}{}
Initialize the Python interpreter. In an application embedding
Python, this should be called before using any other Python/C API
functions; with the exception of
\cfunction{Py_SetProgramName()}\ttindex{Py_SetProgramName()},
\cfunction{PyEval_InitThreads()}\ttindex{PyEval_InitThreads()},
\cfunction{PyEval_ReleaseLock()}\ttindex{PyEval_ReleaseLock()},
and \cfunction{PyEval_AcquireLock()}\ttindex{PyEval_AcquireLock()}.
This initializes the table of loaded modules (\code{sys.modules}),
and\withsubitem{(in module sys)}{\ttindex{modules}\ttindex{path}}
creates the fundamental modules
\module{__builtin__}\refbimodindex{__builtin__},
\module{__main__}\refbimodindex{__main__} and
\module{sys}\refbimodindex{sys}. It also initializes the module
search\indexiii{module}{search}{path} path (\code{sys.path}).
It does not set \code{sys.argv}; use
\cfunction{PySys_SetArgv()}\ttindex{PySys_SetArgv()} for that. This
is a no-op when called for a second time (without calling
\cfunction{Py_Finalize()}\ttindex{Py_Finalize()} first). There is
no return value; it is a fatal error if the initialization fails.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{Py_InitializeEx}{int initsigs}
This function works like \cfunction{Py_Initialize()} if
\var{initsigs} is 1. If \var{initsigs} is 0, it skips
initialization registration of signal handlers, which
might be useful when Python is embedded. \versionadded{2.4}
\end{cfuncdesc}
\begin{cfuncdesc}{int}{Py_IsInitialized}{}
Return true (nonzero) when the Python interpreter has been
initialized, false (zero) if not. After \cfunction{Py_Finalize()}
is called, this returns false until \cfunction{Py_Initialize()} is
called again.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{Py_Finalize}{}
Undo all initializations made by \cfunction{Py_Initialize()} and
subsequent use of Python/C API functions, and destroy all
sub-interpreters (see \cfunction{Py_NewInterpreter()} below) that
were created and not yet destroyed since the last call to
\cfunction{Py_Initialize()}. Ideally, this frees all memory
allocated by the Python interpreter. This is a no-op when called
for a second time (without calling \cfunction{Py_Initialize()} again
first). There is no return value; errors during finalization are
ignored.
This function is provided for a number of reasons. An embedding
application might want to restart Python without having to restart
the application itself. An application that has loaded the Python
interpreter from a dynamically loadable library (or DLL) might want
to free all memory allocated by Python before unloading the
DLL. During a hunt for memory leaks in an application a developer
might want to free all memory allocated by Python before exiting
from the application.
\strong{Bugs and caveats:} The destruction of modules and objects in
modules is done in random order; this may cause destructors
(\method{__del__()} methods) to fail when they depend on other
objects (even functions) or modules. Dynamically loaded extension
modules loaded by Python are not unloaded. Small amounts of memory
allocated by the Python interpreter may not be freed (if you find a
leak, please report it). Memory tied up in circular references
between objects is not freed. Some memory allocated by extension
modules may not be freed. Some extensions may not work properly if
their initialization routine is called more than once; this can
happen if an application calls \cfunction{Py_Initialize()} and
\cfunction{Py_Finalize()} more than once.
\end{cfuncdesc}
\begin{cfuncdesc}{PyThreadState*}{Py_NewInterpreter}{}
Create a new sub-interpreter. This is an (almost) totally separate
environment for the execution of Python code. In particular, the
new interpreter has separate, independent versions of all imported
modules, including the fundamental modules
\module{__builtin__}\refbimodindex{__builtin__},
\module{__main__}\refbimodindex{__main__} and
\module{sys}\refbimodindex{sys}. The table of loaded modules
(\code{sys.modules}) and the module search path (\code{sys.path})
are also separate. The new environment has no \code{sys.argv}
variable. It has new standard I/O stream file objects
\code{sys.stdin}, \code{sys.stdout} and \code{sys.stderr} (however
these refer to the same underlying \ctype{FILE} structures in the C
library).
\withsubitem{(in module sys)}{
\ttindex{stdout}\ttindex{stderr}\ttindex{stdin}}
The return value points to the first thread state created in the new
sub-interpreter. This thread state is made in the current thread
state. Note that no actual thread is created; see the discussion of
thread states below. If creation of the new interpreter is
unsuccessful, \NULL{} is returned; no exception is set since the
exception state is stored in the current thread state and there may
not be a current thread state. (Like all other Python/C API
functions, the global interpreter lock must be held before calling
this function and is still held when it returns; however, unlike
most other Python/C API functions, there needn't be a current thread
state on entry.)
Extension modules are shared between (sub-)interpreters as follows:
the first time a particular extension is imported, it is initialized
normally, and a (shallow) copy of its module's dictionary is
squirreled away. When the same extension is imported by another
(sub-)interpreter, a new module is initialized and filled with the
contents of this copy; the extension's \code{init} function is not
called. Note that this is different from what happens when an
extension is imported after the interpreter has been completely
re-initialized by calling
\cfunction{Py_Finalize()}\ttindex{Py_Finalize()} and
\cfunction{Py_Initialize()}\ttindex{Py_Initialize()}; in that case,
the extension's \code{init\var{module}} function \emph{is} called
again.
\strong{Bugs and caveats:} Because sub-interpreters (and the main
interpreter) are part of the same process, the insulation between
them isn't perfect --- for example, using low-level file operations
like \withsubitem{(in module os)}{\ttindex{close()}}
\function{os.close()} they can (accidentally or maliciously) affect
each other's open files. Because of the way extensions are shared
between (sub-)interpreters, some extensions may not work properly;
this is especially likely when the extension makes use of (static)
global variables, or when the extension manipulates its module's
dictionary after its initialization. It is possible to insert
objects created in one sub-interpreter into a namespace of another
sub-interpreter; this should be done with great care to avoid
sharing user-defined functions, methods, instances or classes
between sub-interpreters, since import operations executed by such
objects may affect the wrong (sub-)interpreter's dictionary of
loaded modules. (XXX This is a hard-to-fix bug that will be
addressed in a future release.)
Also note that the use of this functionality is incompatible with
extension modules such as PyObjC and ctypes that use the
\cfunction{PyGILState_*} APIs (and this is inherent in the way the
\cfunction{PyGILState_*} functions work). Simple things may work,
but confusing behavior will always be near.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{Py_EndInterpreter}{PyThreadState *tstate}
Destroy the (sub-)interpreter represented by the given thread state.
The given thread state must be the current thread state. See the
discussion of thread states below. When the call returns, the
current thread state is \NULL. All thread states associated with
this interpreter are destroyed. (The global interpreter lock must
be held before calling this function and is still held when it
returns.) \cfunction{Py_Finalize()}\ttindex{Py_Finalize()} will
destroy all sub-interpreters that haven't been explicitly destroyed
at that point.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{Py_SetProgramName}{char *name}
This function should be called before
\cfunction{Py_Initialize()}\ttindex{Py_Initialize()} is called
for the first time, if it is called at all. It tells the
interpreter the value of the \code{argv[0]} argument to the
\cfunction{main()}\ttindex{main()} function of the program. This is
used by \cfunction{Py_GetPath()}\ttindex{Py_GetPath()} and some
other functions below to find the Python run-time libraries relative
to the interpreter executable. The default value is
\code{'python'}. The argument should point to a zero-terminated
character string in static storage whose contents will not change
for the duration of the program's execution. No code in the Python
interpreter will change the contents of this storage.
\end{cfuncdesc}
\begin{cfuncdesc}{char*}{Py_GetProgramName}{}
Return the program name set with
\cfunction{Py_SetProgramName()}\ttindex{Py_SetProgramName()}, or the
default. The returned string points into static storage; the caller
should not modify its value.
\end{cfuncdesc}
\begin{cfuncdesc}{char*}{Py_GetPrefix}{}
Return the \emph{prefix} for installed platform-independent files.
This is derived through a number of complicated rules from the
program name set with \cfunction{Py_SetProgramName()} and some
environment variables; for example, if the program name is
\code{'/usr/local/bin/python'}, the prefix is \code{'/usr/local'}.
The returned string points into static storage; the caller should
not modify its value. This corresponds to the \makevar{prefix}
variable in the top-level \file{Makefile} and the
\longprogramopt{prefix} argument to the \program{configure} script
at build time. The value is available to Python code as
\code{sys.prefix}. It is only useful on \UNIX{}. See also the next
function.
\end{cfuncdesc}
\begin{cfuncdesc}{char*}{Py_GetExecPrefix}{}
Return the \emph{exec-prefix} for installed
platform-\emph{de}pendent files. This is derived through a number
of complicated rules from the program name set with
\cfunction{Py_SetProgramName()} and some environment variables; for
example, if the program name is \code{'/usr/local/bin/python'}, the
exec-prefix is \code{'/usr/local'}. The returned string points into
static storage; the caller should not modify its value. This
corresponds to the \makevar{exec_prefix} variable in the top-level
\file{Makefile} and the \longprogramopt{exec-prefix} argument to the
\program{configure} script at build time. The value is available
to Python code as \code{sys.exec_prefix}. It is only useful on
\UNIX.
Background: The exec-prefix differs from the prefix when platform
dependent files (such as executables and shared libraries) are
installed in a different directory tree. In a typical installation,
platform dependent files may be installed in the
\file{/usr/local/plat} subtree while platform independent may be
installed in \file{/usr/local}.
Generally speaking, a platform is a combination of hardware and
software families, e.g. Sparc machines running the Solaris 2.x
operating system are considered the same platform, but Intel
machines running Solaris 2.x are another platform, and Intel
machines running Linux are yet another platform. Different major
revisions of the same operating system generally also form different
platforms. Non-\UNIX{} operating systems are a different story; the
installation strategies on those systems are so different that the
prefix and exec-prefix are meaningless, and set to the empty string.
Note that compiled Python bytecode files are platform independent
(but not independent from the Python version by which they were
compiled!).
System administrators will know how to configure the \program{mount}
or \program{automount} programs to share \file{/usr/local} between
platforms while having \file{/usr/local/plat} be a different
filesystem for each platform.
\end{cfuncdesc}
\begin{cfuncdesc}{char*}{Py_GetProgramFullPath}{}
Return the full program name of the Python executable; this is
computed as a side-effect of deriving the default module search path
from the program name (set by
\cfunction{Py_SetProgramName()}\ttindex{Py_SetProgramName()} above).
The returned string points into static storage; the caller should
not modify its value. The value is available to Python code as
\code{sys.executable}.
\withsubitem{(in module sys)}{\ttindex{executable}}
\end{cfuncdesc}
\begin{cfuncdesc}{char*}{Py_GetPath}{}
\indexiii{module}{search}{path}
Return the default module search path; this is computed from the
program name (set by \cfunction{Py_SetProgramName()} above) and some
environment variables. The returned string consists of a series of
directory names separated by a platform dependent delimiter
character. The delimiter character is \character{:} on \UNIX{} and Mac OS X,
\character{;} on Windows. The returned string points into
static storage; the caller should not modify its value. The value
is available to Python code as the list
\code{sys.path}\withsubitem{(in module sys)}{\ttindex{path}}, which
may be modified to change the future search path for loaded
modules.
% XXX should give the exact rules
\end{cfuncdesc}
\begin{cfuncdesc}{const char*}{Py_GetVersion}{}
Return the version of this Python interpreter. This is a string
that looks something like
\begin{verbatim}
"1.5 (#67, Dec 31 1997, 22:34:28) [GCC 2.7.2.2]"
\end{verbatim}
The first word (up to the first space character) is the current
Python version; the first three characters are the major and minor
version separated by a period. The returned string points into
static storage; the caller should not modify its value. The value
is available to Python code as \code{sys.version}.
\withsubitem{(in module sys)}{\ttindex{version}}
\end{cfuncdesc}
\begin{cfuncdesc}{const char*}{Py_GetBuildNumber}{}
Return a string representing the Subversion revision that this Python
executable was built from. This number is a string because it may contain a
trailing 'M' if Python was built from a mixed revision source tree.
\versionadded{2.5}
\end{cfuncdesc}
\begin{cfuncdesc}{const char*}{Py_GetPlatform}{}
Return the platform identifier for the current platform. On \UNIX,
this is formed from the ``official'' name of the operating system,
converted to lower case, followed by the major revision number;
e.g., for Solaris 2.x, which is also known as SunOS 5.x, the value
is \code{'sunos5'}. On Mac OS X, it is \code{'darwin'}. On Windows,
it is \code{'win'}. The returned string points into static storage;
the caller should not modify its value. The value is available to
Python code as \code{sys.platform}.
\withsubitem{(in module sys)}{\ttindex{platform}}
\end{cfuncdesc}
\begin{cfuncdesc}{const char*}{Py_GetCopyright}{}
Return the official copyright string for the current Python version,
for example
\code{'Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam'}
The returned string points into static storage; the caller should
not modify its value. The value is available to Python code as
\code{sys.copyright}.
\withsubitem{(in module sys)}{\ttindex{copyright}}
\end{cfuncdesc}
\begin{cfuncdesc}{const char*}{Py_GetCompiler}{}
Return an indication of the compiler used to build the current
Python version, in square brackets, for example:
\begin{verbatim}
"[GCC 2.7.2.2]"
\end{verbatim}
The returned string points into static storage; the caller should
not modify its value. The value is available to Python code as part
of the variable \code{sys.version}.
\withsubitem{(in module sys)}{\ttindex{version}}
\end{cfuncdesc}
\begin{cfuncdesc}{const char*}{Py_GetBuildInfo}{}
Return information about the sequence number and build date and time
of the current Python interpreter instance, for example
\begin{verbatim}
"#67, Aug 1 1997, 22:34:28"
\end{verbatim}
The returned string points into static storage; the caller should
not modify its value. The value is available to Python code as part
of the variable \code{sys.version}.
\withsubitem{(in module sys)}{\ttindex{version}}
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PySys_SetArgv}{int argc, char **argv}
Set \code{sys.argv} based on \var{argc} and \var{argv}. These
parameters are similar to those passed to the program's
\cfunction{main()}\ttindex{main()} function with the difference that
the first entry should refer to the script file to be executed
rather than the executable hosting the Python interpreter. If there
isn't a script that will be run, the first entry in \var{argv} can
be an empty string. If this function fails to initialize
\code{sys.argv}, a fatal condition is signalled using
\cfunction{Py_FatalError()}\ttindex{Py_FatalError()}.
\withsubitem{(in module sys)}{\ttindex{argv}}
% XXX impl. doesn't seem consistent in allowing 0/NULL for the params;
% check w/ Guido.
\end{cfuncdesc}
% XXX Other PySys thingies (doesn't really belong in this chapter)
\section{Thread State and the Global Interpreter Lock
\label{threads}}
\index{global interpreter lock}
\index{interpreter lock}
\index{lock, interpreter}
The Python interpreter is not fully thread safe. In order to support
multi-threaded Python programs, there's a global lock that must be
held by the current thread before it can safely access Python objects.
Without the lock, even the simplest operations could cause problems in
a multi-threaded program: for example, when two threads simultaneously
increment the reference count of the same object, the reference count
could end up being incremented only once instead of twice.
Therefore, the rule exists that only the thread that has acquired the
global interpreter lock may operate on Python objects or call Python/C
API functions. In order to support multi-threaded Python programs,
the interpreter regularly releases and reacquires the lock --- by
default, every 100 bytecode instructions (this can be changed with
\withsubitem{(in module sys)}{\ttindex{setcheckinterval()}}
\function{sys.setcheckinterval()}). The lock is also released and
reacquired around potentially blocking I/O operations like reading or
writing a file, so that other threads can run while the thread that
requests the I/O is waiting for the I/O operation to complete.
The Python interpreter needs to keep some bookkeeping information
separate per thread --- for this it uses a data structure called
\ctype{PyThreadState}\ttindex{PyThreadState}. There's one global
variable, however: the pointer to the current
\ctype{PyThreadState}\ttindex{PyThreadState} structure. While most
thread packages have a way to store ``per-thread global data,''
Python's internal platform independent thread abstraction doesn't
support this yet. Therefore, the current thread state must be
manipulated explicitly.
This is easy enough in most cases. Most code manipulating the global
interpreter lock has the following simple structure:
\begin{verbatim}
Save the thread state in a local variable.
Release the interpreter lock.
...Do some blocking I/O operation...
Reacquire the interpreter lock.
Restore the thread state from the local variable.
\end{verbatim}
This is so common that a pair of macros exists to simplify it:
\begin{verbatim}
Py_BEGIN_ALLOW_THREADS
...Do some blocking I/O operation...
Py_END_ALLOW_THREADS
\end{verbatim}
The
\csimplemacro{Py_BEGIN_ALLOW_THREADS}\ttindex{Py_BEGIN_ALLOW_THREADS}
macro opens a new block and declares a hidden local variable; the
\csimplemacro{Py_END_ALLOW_THREADS}\ttindex{Py_END_ALLOW_THREADS}
macro closes the block. Another advantage of using these two macros
is that when Python is compiled without thread support, they are
defined empty, thus saving the thread state and lock manipulations.
When thread support is enabled, the block above expands to the
following code:
\begin{verbatim}
PyThreadState *_save;
_save = PyEval_SaveThread();
...Do some blocking I/O operation...
PyEval_RestoreThread(_save);
\end{verbatim}
Using even lower level primitives, we can get roughly the same effect
as follows:
\begin{verbatim}
PyThreadState *_save;
_save = PyThreadState_Swap(NULL);
PyEval_ReleaseLock();
...Do some blocking I/O operation...
PyEval_AcquireLock();
PyThreadState_Swap(_save);
\end{verbatim}
There are some subtle differences; in particular,
\cfunction{PyEval_RestoreThread()}\ttindex{PyEval_RestoreThread()} saves
and restores the value of the global variable
\cdata{errno}\ttindex{errno}, since the lock manipulation does not
guarantee that \cdata{errno} is left alone. Also, when thread support
is disabled,
\cfunction{PyEval_SaveThread()}\ttindex{PyEval_SaveThread()} and
\cfunction{PyEval_RestoreThread()} don't manipulate the lock; in this
case, \cfunction{PyEval_ReleaseLock()}\ttindex{PyEval_ReleaseLock()} and
\cfunction{PyEval_AcquireLock()}\ttindex{PyEval_AcquireLock()} are not
available. This is done so that dynamically loaded extensions
compiled with thread support enabled can be loaded by an interpreter
that was compiled with disabled thread support.
The global interpreter lock is used to protect the pointer to the
current thread state. When releasing the lock and saving the thread
state, the current thread state pointer must be retrieved before the
lock is released (since another thread could immediately acquire the
lock and store its own thread state in the global variable).
Conversely, when acquiring the lock and restoring the thread state,
the lock must be acquired before storing the thread state pointer.
Why am I going on with so much detail about this? Because when
threads are created from C, they don't have the global interpreter
lock, nor is there a thread state data structure for them. Such
threads must bootstrap themselves into existence, by first creating a
thread state data structure, then acquiring the lock, and finally
storing their thread state pointer, before they can start using the
Python/C API. When they are done, they should reset the thread state
pointer, release the lock, and finally free their thread state data
structure.
Beginning with version 2.3, threads can now take advantage of the
\cfunction{PyGILState_*()} functions to do all of the above
automatically. The typical idiom for calling into Python from a C
thread is now:
\begin{verbatim}
PyGILState_STATE gstate;
gstate = PyGILState_Ensure();
/* Perform Python actions here. */
result = CallSomeFunction();
/* evaluate result */
/* Release the thread. No Python API allowed beyond this point. */
PyGILState_Release(gstate);
\end{verbatim}
Note that the \cfunction{PyGILState_*()} functions assume there is
only one global interpreter (created automatically by
\cfunction{Py_Initialize()}). Python still supports the creation of
additional interpreters (using \cfunction{Py_NewInterpreter()}), but
mixing multiple interpreters and the \cfunction{PyGILState_*()} API is
unsupported.
\begin{ctypedesc}{PyInterpreterState}
This data structure represents the state shared by a number of
cooperating threads. Threads belonging to the same interpreter
share their module administration and a few other internal items.
There are no public members in this structure.
Threads belonging to different interpreters initially share nothing,
except process state like available memory, open file descriptors
and such. The global interpreter lock is also shared by all
threads, regardless of to which interpreter they belong.
\end{ctypedesc}
\begin{ctypedesc}{PyThreadState}
This data structure represents the state of a single thread. The
only public data member is \ctype{PyInterpreterState
*}\member{interp}, which points to this thread's interpreter state.
\end{ctypedesc}
\begin{cfuncdesc}{void}{PyEval_InitThreads}{}
Initialize and acquire the global interpreter lock. It should be
called in the main thread before creating a second thread or
engaging in any other thread operations such as
\cfunction{PyEval_ReleaseLock()}\ttindex{PyEval_ReleaseLock()} or
\code{PyEval_ReleaseThread(\var{tstate})}\ttindex{PyEval_ReleaseThread()}.
It is not needed before calling
\cfunction{PyEval_SaveThread()}\ttindex{PyEval_SaveThread()} or
\cfunction{PyEval_RestoreThread()}\ttindex{PyEval_RestoreThread()}.
This is a no-op when called for a second time. It is safe to call
this function before calling
\cfunction{Py_Initialize()}\ttindex{Py_Initialize()}.
When only the main thread exists, no lock operations are needed.
This is a common situation (most Python programs do not use
threads), and the lock operations slow the interpreter down a bit.
Therefore, the lock is not created initially. This situation is
equivalent to having acquired the lock: when there is only a single
thread, all object accesses are safe. Therefore, when this function
initializes the lock, it also acquires it. Before the Python
\module{thread}\refbimodindex{thread} module creates a new thread,
knowing that either it has the lock or the lock hasn't been created
yet, it calls \cfunction{PyEval_InitThreads()}. When this call
returns, it is guaranteed that the lock has been created and that the
calling thread has acquired it.
It is \strong{not} safe to call this function when it is unknown
which thread (if any) currently has the global interpreter lock.
This function is not available when thread support is disabled at
compile time.
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PyEval_ThreadsInitialized}{}
Returns a non-zero value if \cfunction{PyEval_InitThreads()} has been
called. This function can be called without holding the lock, and
therefore can be used to avoid calls to the locking API when running
single-threaded. This function is not available when thread support
is disabled at compile time. \versionadded{2.4}
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PyEval_AcquireLock}{}
Acquire the global interpreter lock. The lock must have been
created earlier. If this thread already has the lock, a deadlock
ensues. This function is not available when thread support is
disabled at compile time.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PyEval_ReleaseLock}{}
Release the global interpreter lock. The lock must have been
created earlier. This function is not available when thread support
is disabled at compile time.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PyEval_AcquireThread}{PyThreadState *tstate}
Acquire the global interpreter lock and set the current thread
state to \var{tstate}, which should not be \NULL. The lock must
have been created earlier. If this thread already has the lock,
deadlock ensues. This function is not available when thread support
is disabled at compile time.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PyEval_ReleaseThread}{PyThreadState *tstate}
Reset the current thread state to \NULL{} and release the global
interpreter lock. The lock must have been created earlier and must
be held by the current thread. The \var{tstate} argument, which
must not be \NULL, is only used to check that it represents the
current thread state --- if it isn't, a fatal error is reported.
This function is not available when thread support is disabled at
compile time.
\end{cfuncdesc}
\begin{cfuncdesc}{PyThreadState*}{PyEval_SaveThread}{}
Release the interpreter lock (if it has been created and thread
support is enabled) and reset the thread state to \NULL, returning
the previous thread state (which is not \NULL). If the lock has
been created, the current thread must have acquired it. (This
function is available even when thread support is disabled at
compile time.)
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PyEval_RestoreThread}{PyThreadState *tstate}
Acquire the interpreter lock (if it has been created and thread
support is enabled) and set the thread state to \var{tstate}, which
must not be \NULL. If the lock has been created, the current thread
must not have acquired it, otherwise deadlock ensues. (This
function is available even when thread support is disabled at
compile time.)
\end{cfuncdesc}
The following macros are normally used without a trailing semicolon;
look for example usage in the Python source distribution.
\begin{csimplemacrodesc}{Py_BEGIN_ALLOW_THREADS}
This macro expands to
\samp{\{ PyThreadState *_save; _save = PyEval_SaveThread();}.
Note that it contains an opening brace; it must be matched with a
following \csimplemacro{Py_END_ALLOW_THREADS} macro. See above for
further discussion of this macro. It is a no-op when thread support
is disabled at compile time.
\end{csimplemacrodesc}
\begin{csimplemacrodesc}{Py_END_ALLOW_THREADS}
This macro expands to \samp{PyEval_RestoreThread(_save); \}}.
Note that it contains a closing brace; it must be matched with an
earlier \csimplemacro{Py_BEGIN_ALLOW_THREADS} macro. See above for
further discussion of this macro. It is a no-op when thread support
is disabled at compile time.
\end{csimplemacrodesc}
\begin{csimplemacrodesc}{Py_BLOCK_THREADS}
This macro expands to \samp{PyEval_RestoreThread(_save);}: it is
equivalent to \csimplemacro{Py_END_ALLOW_THREADS} without the
closing brace. It is a no-op when thread support is disabled at
compile time.
\end{csimplemacrodesc}
\begin{csimplemacrodesc}{Py_UNBLOCK_THREADS}
This macro expands to \samp{_save = PyEval_SaveThread();}: it is
equivalent to \csimplemacro{Py_BEGIN_ALLOW_THREADS} without the
opening brace and variable declaration. It is a no-op when thread
support is disabled at compile time.
\end{csimplemacrodesc}
All of the following functions are only available when thread support
is enabled at compile time, and must be called only when the
interpreter lock has been created.
\begin{cfuncdesc}{PyInterpreterState*}{PyInterpreterState_New}{}
Create a new interpreter state object. The interpreter lock need
not be held, but may be held if it is necessary to serialize calls
to this function.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PyInterpreterState_Clear}{PyInterpreterState *interp}
Reset all information in an interpreter state object. The
interpreter lock must be held.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PyInterpreterState_Delete}{PyInterpreterState *interp}
Destroy an interpreter state object. The interpreter lock need not
be held. The interpreter state must have been reset with a previous
call to \cfunction{PyInterpreterState_Clear()}.
\end{cfuncdesc}
\begin{cfuncdesc}{PyThreadState*}{PyThreadState_New}{PyInterpreterState *interp}
Create a new thread state object belonging to the given interpreter
object. The interpreter lock need not be held, but may be held if
it is necessary to serialize calls to this function.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PyThreadState_Clear}{PyThreadState *tstate}
Reset all information in a thread state object. The interpreter lock
must be held.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PyThreadState_Delete}{PyThreadState *tstate}
Destroy a thread state object. The interpreter lock need not be
held. The thread state must have been reset with a previous call to
\cfunction{PyThreadState_Clear()}.
\end{cfuncdesc}
\begin{cfuncdesc}{PyThreadState*}{PyThreadState_Get}{}
Return the current thread state. The interpreter lock must be
held. When the current thread state is \NULL, this issues a fatal
error (so that the caller needn't check for \NULL).
\end{cfuncdesc}
\begin{cfuncdesc}{PyThreadState*}{PyThreadState_Swap}{PyThreadState *tstate}
Swap the current thread state with the thread state given by the
argument \var{tstate}, which may be \NULL. The interpreter lock
must be held.
\end{cfuncdesc}
\begin{cfuncdesc}{PyObject*}{PyThreadState_GetDict}{}
Return a dictionary in which extensions can store thread-specific
state information. Each extension should use a unique key to use to
store state in the dictionary. It is okay to call this function
when no current thread state is available.
If this function returns \NULL, no exception has been raised and the
caller should assume no current thread state is available.
\versionchanged[Previously this could only be called when a current
thread is active, and \NULL{} meant that an exception was raised]{2.3}
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PyThreadState_SetAsyncExc}{long id, PyObject *exc}
Asynchronously raise an exception in a thread.
The \var{id} argument is the thread id of the target thread;
\var{exc} is the exception object to be raised.
This function does not steal any references to \var{exc}.
To prevent naive misuse, you must write your own C extension
to call this. Must be called with the GIL held.
Returns the number of thread states modified; this is normally one, but
will be zero if the thread id isn't found. If \var{exc} is
\constant{NULL}, the pending exception (if any) for the thread is cleared.
This raises no exceptions.
\versionadded{2.3}
\end{cfuncdesc}
\begin{cfuncdesc}{PyGILState_STATE}{PyGILState_Ensure}{}
Ensure that the current thread is ready to call the Python C API
regardless of the current state of Python, or of its thread lock.
This may be called as many times as desired by a thread as long as
each call is matched with a call to \cfunction{PyGILState_Release()}.
In general, other thread-related APIs may be used between
\cfunction{PyGILState_Ensure()} and \cfunction{PyGILState_Release()}
calls as long as the thread state is restored to its previous state
before the Release(). For example, normal usage of the
\csimplemacro{Py_BEGIN_ALLOW_THREADS} and
\csimplemacro{Py_END_ALLOW_THREADS} macros is acceptable.
The return value is an opaque "handle" to the thread state when
\cfunction{PyGILState_Acquire()} was called, and must be passed to
\cfunction{PyGILState_Release()} to ensure Python is left in the same
state. Even though recursive calls are allowed, these handles
\emph{cannot} be shared - each unique call to
\cfunction{PyGILState_Ensure} must save the handle for its call to
\cfunction{PyGILState_Release}.
When the function returns, the current thread will hold the GIL.
Failure is a fatal error.
\versionadded{2.3}
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PyGILState_Release}{PyGILState_STATE}
Release any resources previously acquired. After this call, Python's
state will be the same as it was prior to the corresponding
\cfunction{PyGILState_Ensure} call (but generally this state will be
unknown to the caller, hence the use of the GILState API.)
Every call to \cfunction{PyGILState_Ensure()} must be matched by a call to
\cfunction{PyGILState_Release()} on the same thread.
\versionadded{2.3}
\end{cfuncdesc}
\section{Profiling and Tracing \label{profiling}}
\sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
The Python interpreter provides some low-level support for attaching
profiling and execution tracing facilities. These are used for
profiling, debugging, and coverage analysis tools.
Starting with Python 2.2, the implementation of this facility was
substantially revised, and an interface from C was added. This C
interface allows the profiling or tracing code to avoid the overhead
of calling through Python-level callable objects, making a direct C
function call instead. The essential attributes of the facility have
not changed; the interface allows trace functions to be installed
per-thread, and the basic events reported to the trace function are
the same as had been reported to the Python-level trace functions in
previous versions.
\begin{ctypedesc}[Py_tracefunc]{int (*Py_tracefunc)(PyObject *obj,
PyFrameObject *frame, int what,
PyObject *arg)}
The type of the trace function registered using
\cfunction{PyEval_SetProfile()} and \cfunction{PyEval_SetTrace()}.
The first parameter is the object passed to the registration
function as \var{obj}, \var{frame} is the frame object to which the
event pertains, \var{what} is one of the constants
\constant{PyTrace_CALL}, \constant{PyTrace_EXCEPTION},
\constant{PyTrace_LINE}, \constant{PyTrace_RETURN},
\constant{PyTrace_C_CALL}, \constant{PyTrace_C_EXCEPTION},
or \constant{PyTrace_C_RETURN}, and \var{arg}
depends on the value of \var{what}:
\begin{tableii}{l|l}{constant}{Value of \var{what}}{Meaning of \var{arg}}
\lineii{PyTrace_CALL}{Always \NULL.}
\lineii{PyTrace_EXCEPTION}{Exception information as returned by
\function{sys.exc_info()}.}
\lineii{PyTrace_LINE}{Always \NULL.}
\lineii{PyTrace_RETURN}{Value being returned to the caller.}
\lineii{PyTrace_C_CALL}{Name of function being called.}
\lineii{PyTrace_C_EXCEPTION}{Always \NULL.}
\lineii{PyTrace_C_RETURN}{Always \NULL.}
\end{tableii}
\end{ctypedesc}
\begin{cvardesc}{int}{PyTrace_CALL}
The value of the \var{what} parameter to a \ctype{Py_tracefunc}
function when a new call to a function or method is being reported,
or a new entry into a generator. Note that the creation of the
iterator for a generator function is not reported as there is no
control transfer to the Python bytecode in the corresponding frame.
\end{cvardesc}
\begin{cvardesc}{int}{PyTrace_EXCEPTION}
The value of the \var{what} parameter to a \ctype{Py_tracefunc}
function when an exception has been raised. The callback function
is called with this value for \var{what} when after any bytecode is
processed after which the exception becomes set within the frame
being executed. The effect of this is that as exception propagation
causes the Python stack to unwind, the callback is called upon
return to each frame as the exception propagates. Only trace
functions receives these events; they are not needed by the
profiler.
\end{cvardesc}
\begin{cvardesc}{int}{PyTrace_LINE}
The value passed as the \var{what} parameter to a trace function
(but not a profiling function) when a line-number event is being
reported.
\end{cvardesc}
\begin{cvardesc}{int}{PyTrace_RETURN}
The value for the \var{what} parameter to \ctype{Py_tracefunc}
functions when a call is returning without propagating an exception.
\end{cvardesc}
\begin{cvardesc}{int}{PyTrace_C_CALL}
The value for the \var{what} parameter to \ctype{Py_tracefunc}
functions when a C function is about to be called.
\end{cvardesc}
\begin{cvardesc}{int}{PyTrace_C_EXCEPTION}
The value for the \var{what} parameter to \ctype{Py_tracefunc}
functions when a C function has thrown an exception.
\end{cvardesc}
\begin{cvardesc}{int}{PyTrace_C_RETURN}
The value for the \var{what} parameter to \ctype{Py_tracefunc}
functions when a C function has returned.
\end{cvardesc}
\begin{cfuncdesc}{void}{PyEval_SetProfile}{Py_tracefunc func, PyObject *obj}
Set the profiler function to \var{func}. The \var{obj} parameter is
passed to the function as its first parameter, and may be any Python
object, or \NULL. If the profile function needs to maintain state,
using a different value for \var{obj} for each thread provides a
convenient and thread-safe place to store it. The profile function
is called for all monitored events except the line-number events.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PyEval_SetTrace}{Py_tracefunc func, PyObject *obj}
Set the tracing function to \var{func}. This is similar to
\cfunction{PyEval_SetProfile()}, except the tracing function does
receive line-number events.
\end{cfuncdesc}
\section{Advanced Debugger Support \label{advanced-debugging}}
\sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
These functions are only intended to be used by advanced debugging
tools.
\begin{cfuncdesc}{PyInterpreterState*}{PyInterpreterState_Head}{}
Return the interpreter state object at the head of the list of all
such objects.
\versionadded{2.2}
\end{cfuncdesc}
\begin{cfuncdesc}{PyInterpreterState*}{PyInterpreterState_Next}{PyInterpreterState *interp}
Return the next interpreter state object after \var{interp} from the
list of all such objects.
\versionadded{2.2}
\end{cfuncdesc}
\begin{cfuncdesc}{PyThreadState *}{PyInterpreterState_ThreadHead}{PyInterpreterState *interp}
Return the a pointer to the first \ctype{PyThreadState} object in
the list of threads associated with the interpreter \var{interp}.
\versionadded{2.2}
\end{cfuncdesc}
\begin{cfuncdesc}{PyThreadState*}{PyThreadState_Next}{PyThreadState *tstate}
Return the next thread state object after \var{tstate} from the list
of all such objects belonging to the same \ctype{PyInterpreterState}
object.
\versionadded{2.2}
\end{cfuncdesc}

View File

@ -1,627 +0,0 @@
\chapter{Introduction \label{intro}}
The Application Programmer's Interface to Python gives C and
\Cpp{} programmers access to the Python interpreter at a variety of
levels. The API is equally usable from \Cpp, but for brevity it is
generally referred to as the Python/C API. There are two
fundamentally different reasons for using the Python/C API. The first
reason is to write \emph{extension modules} for specific purposes;
these are C modules that extend the Python interpreter. This is
probably the most common use. The second reason is to use Python as a
component in a larger application; this technique is generally
referred to as \dfn{embedding} Python in an application.
Writing an extension module is a relatively well-understood process,
where a ``cookbook'' approach works well. There are several tools
that automate the process to some extent. While people have embedded
Python in other applications since its early existence, the process of
embedding Python is less straightforward than writing an extension.
Many API functions are useful independent of whether you're embedding
or extending Python; moreover, most applications that embed Python
will need to provide a custom extension as well, so it's probably a
good idea to become familiar with writing an extension before
attempting to embed Python in a real application.
\section{Include Files \label{includes}}
All function, type and macro definitions needed to use the Python/C
API are included in your code by the following line:
\begin{verbatim}
#include "Python.h"
\end{verbatim}
This implies inclusion of the following standard headers:
\code{<stdio.h>}, \code{<string.h>}, \code{<errno.h>},
\code{<limits.h>}, and \code{<stdlib.h>} (if available).
\begin{notice}[warning]
Since Python may define some pre-processor definitions which affect
the standard headers on some systems, you \emph{must} include
\file{Python.h} before any standard headers are included.
\end{notice}
All user visible names defined by Python.h (except those defined by
the included standard headers) have one of the prefixes \samp{Py} or
\samp{_Py}. Names beginning with \samp{_Py} are for internal use by
the Python implementation and should not be used by extension writers.
Structure member names do not have a reserved prefix.
\strong{Important:} user code should never define names that begin
with \samp{Py} or \samp{_Py}. This confuses the reader, and
jeopardizes the portability of the user code to future Python
versions, which may define additional names beginning with one of
these prefixes.
The header files are typically installed with Python. On \UNIX, these
are located in the directories
\file{\envvar{prefix}/include/python\var{version}/} and
\file{\envvar{exec_prefix}/include/python\var{version}/}, where
\envvar{prefix} and \envvar{exec_prefix} are defined by the
corresponding parameters to Python's \program{configure} script and
\var{version} is \code{sys.version[:3]}. On Windows, the headers are
installed in \file{\envvar{prefix}/include}, where \envvar{prefix} is
the installation directory specified to the installer.
To include the headers, place both directories (if different) on your
compiler's search path for includes. Do \emph{not} place the parent
directories on the search path and then use
\samp{\#include <python\shortversion/Python.h>}; this will break on
multi-platform builds since the platform independent headers under
\envvar{prefix} include the platform specific headers from
\envvar{exec_prefix}.
\Cpp{} users should note that though the API is defined entirely using
C, the header files do properly declare the entry points to be
\code{extern "C"}, so there is no need to do anything special to use
the API from \Cpp.
\section{Objects, Types and Reference Counts \label{objects}}
Most Python/C API functions have one or more arguments as well as a
return value of type \ctype{PyObject*}. This type is a pointer
to an opaque data type representing an arbitrary Python
object. Since all Python object types are treated the same way by the
Python language in most situations (e.g., assignments, scope rules,
and argument passing), it is only fitting that they should be
represented by a single C type. Almost all Python objects live on the
heap: you never declare an automatic or static variable of type
\ctype{PyObject}, only pointer variables of type \ctype{PyObject*} can
be declared. The sole exception are the type objects\obindex{type};
since these must never be deallocated, they are typically static
\ctype{PyTypeObject} objects.
All Python objects (even Python integers) have a \dfn{type} and a
\dfn{reference count}. An object's type determines what kind of object
it is (e.g., an integer, a list, or a user-defined function; there are
many more as explained in the \citetitle[../ref/ref.html]{Python
Reference Manual}). For each of the well-known types there is a macro
to check whether an object is of that type; for instance,
\samp{PyList_Check(\var{a})} is true if (and only if) the object
pointed to by \var{a} is a Python list.
\subsection{Reference Counts \label{refcounts}}
The reference count is important because today's computers have a
finite (and often severely limited) memory size; it counts how many
different places there are that have a reference to an object. Such a
place could be another object, or a global (or static) C variable, or
a local variable in some C function. When an object's reference count
becomes zero, the object is deallocated. If it contains references to
other objects, their reference count is decremented. Those other
objects may be deallocated in turn, if this decrement makes their
reference count become zero, and so on. (There's an obvious problem
with objects that reference each other here; for now, the solution is
``don't do that.'')
Reference counts are always manipulated explicitly. The normal way is
to use the macro \cfunction{Py_INCREF()}\ttindex{Py_INCREF()} to
increment an object's reference count by one, and
\cfunction{Py_DECREF()}\ttindex{Py_DECREF()} to decrement it by
one. The \cfunction{Py_DECREF()} macro is considerably more complex
than the incref one, since it must check whether the reference count
becomes zero and then cause the object's deallocator to be called.
The deallocator is a function pointer contained in the object's type
structure. The type-specific deallocator takes care of decrementing
the reference counts for other objects contained in the object if this
is a compound object type, such as a list, as well as performing any
additional finalization that's needed. There's no chance that the
reference count can overflow; at least as many bits are used to hold
the reference count as there are distinct memory locations in virtual
memory (assuming \code{sizeof(long) >= sizeof(char*)}). Thus, the
reference count increment is a simple operation.
It is not necessary to increment an object's reference count for every
local variable that contains a pointer to an object. In theory, the
object's reference count goes up by one when the variable is made to
point to it and it goes down by one when the variable goes out of
scope. However, these two cancel each other out, so at the end the
reference count hasn't changed. The only real reason to use the
reference count is to prevent the object from being deallocated as
long as our variable is pointing to it. If we know that there is at
least one other reference to the object that lives at least as long as
our variable, there is no need to increment the reference count
temporarily. An important situation where this arises is in objects
that are passed as arguments to C functions in an extension module
that are called from Python; the call mechanism guarantees to hold a
reference to every argument for the duration of the call.
However, a common pitfall is to extract an object from a list and
hold on to it for a while without incrementing its reference count.
Some other operation might conceivably remove the object from the
list, decrementing its reference count and possible deallocating it.
The real danger is that innocent-looking operations may invoke
arbitrary Python code which could do this; there is a code path which
allows control to flow back to the user from a \cfunction{Py_DECREF()},
so almost any operation is potentially dangerous.
A safe approach is to always use the generic operations (functions
whose name begins with \samp{PyObject_}, \samp{PyNumber_},
\samp{PySequence_} or \samp{PyMapping_}). These operations always
increment the reference count of the object they return. This leaves
the caller with the responsibility to call
\cfunction{Py_DECREF()} when they are done with the result; this soon
becomes second nature.
\subsubsection{Reference Count Details \label{refcountDetails}}
The reference count behavior of functions in the Python/C API is best
explained in terms of \emph{ownership of references}. Ownership
pertains to references, never to objects (objects are not owned: they
are always shared). "Owning a reference" means being responsible for
calling Py_DECREF on it when the reference is no longer needed.
Ownership can also be transferred, meaning that the code that receives
ownership of the reference then becomes responsible for eventually
decref'ing it by calling \cfunction{Py_DECREF()} or
\cfunction{Py_XDECREF()} when it's no longer needed---or passing on
this responsibility (usually to its caller).
When a function passes ownership of a reference on to its caller, the
caller is said to receive a \emph{new} reference. When no ownership
is transferred, the caller is said to \emph{borrow} the reference.
Nothing needs to be done for a borrowed reference.
Conversely, when a calling function passes it a reference to an
object, there are two possibilities: the function \emph{steals} a
reference to the object, or it does not. \emph{Stealing a reference}
means that when you pass a reference to a function, that function
assumes that it now owns that reference, and you are not responsible
for it any longer.
Few functions steal references; the two notable exceptions are
\cfunction{PyList_SetItem()}\ttindex{PyList_SetItem()} and
\cfunction{PyTuple_SetItem()}\ttindex{PyTuple_SetItem()}, which
steal a reference to the item (but not to the tuple or list into which
the item is put!). These functions were designed to steal a reference
because of a common idiom for populating a tuple or list with newly
created objects; for example, the code to create the tuple \code{(1,
2, "three")} could look like this (forgetting about error handling for
the moment; a better way to code this is shown below):
\begin{verbatim}
PyObject *t;
t = PyTuple_New(3);
PyTuple_SetItem(t, 0, PyInt_FromLong(1L));
PyTuple_SetItem(t, 1, PyInt_FromLong(2L));
PyTuple_SetItem(t, 2, PyString_FromString("three"));
\end{verbatim}
Here, \cfunction{PyInt_FromLong()} returns a new reference which is
immediately stolen by \cfunction{PyTuple_SetItem()}. When you want to
keep using an object although the reference to it will be stolen,
use \cfunction{Py_INCREF()} to grab another reference before calling the
reference-stealing function.
Incidentally, \cfunction{PyTuple_SetItem()} is the \emph{only} way to
set tuple items; \cfunction{PySequence_SetItem()} and
\cfunction{PyObject_SetItem()} refuse to do this since tuples are an
immutable data type. You should only use
\cfunction{PyTuple_SetItem()} for tuples that you are creating
yourself.
Equivalent code for populating a list can be written using
\cfunction{PyList_New()} and \cfunction{PyList_SetItem()}.
However, in practice, you will rarely use these ways of
creating and populating a tuple or list. There's a generic function,
\cfunction{Py_BuildValue()}, that can create most common objects from
C values, directed by a \dfn{format string}. For example, the
above two blocks of code could be replaced by the following (which
also takes care of the error checking):
\begin{verbatim}
PyObject *tuple, *list;
tuple = Py_BuildValue("(iis)", 1, 2, "three");
list = Py_BuildValue("[iis]", 1, 2, "three");
\end{verbatim}
It is much more common to use \cfunction{PyObject_SetItem()} and
friends with items whose references you are only borrowing, like
arguments that were passed in to the function you are writing. In
that case, their behaviour regarding reference counts is much saner,
since you don't have to increment a reference count so you can give a
reference away (``have it be stolen''). For example, this function
sets all items of a list (actually, any mutable sequence) to a given
item:
\begin{verbatim}
int
set_all(PyObject *target, PyObject *item)
{
int i, n;
n = PyObject_Length(target);
if (n < 0)
return -1;
for (i = 0; i < n; i++) {
PyObject *index = PyInt_FromLong(i);
if (!index)
return -1;
if (PyObject_SetItem(target, index, item) < 0)
return -1;
Py_DECREF(index);
}
return 0;
}
\end{verbatim}
\ttindex{set_all()}
The situation is slightly different for function return values.
While passing a reference to most functions does not change your
ownership responsibilities for that reference, many functions that
return a reference to an object give you ownership of the reference.
The reason is simple: in many cases, the returned object is created
on the fly, and the reference you get is the only reference to the
object. Therefore, the generic functions that return object
references, like \cfunction{PyObject_GetItem()} and
\cfunction{PySequence_GetItem()}, always return a new reference (the
caller becomes the owner of the reference).
It is important to realize that whether you own a reference returned
by a function depends on which function you call only --- \emph{the
plumage} (the type of the object passed as an
argument to the function) \emph{doesn't enter into it!} Thus, if you
extract an item from a list using \cfunction{PyList_GetItem()}, you
don't own the reference --- but if you obtain the same item from the
same list using \cfunction{PySequence_GetItem()} (which happens to
take exactly the same arguments), you do own a reference to the
returned object.
Here is an example of how you could write a function that computes the
sum of the items in a list of integers; once using
\cfunction{PyList_GetItem()}\ttindex{PyList_GetItem()}, and once using
\cfunction{PySequence_GetItem()}\ttindex{PySequence_GetItem()}.
\begin{verbatim}
long
sum_list(PyObject *list)
{
int i, n;
long total = 0;
PyObject *item;
n = PyList_Size(list);
if (n < 0)
return -1; /* Not a list */
for (i = 0; i < n; i++) {
item = PyList_GetItem(list, i); /* Can't fail */
if (!PyInt_Check(item)) continue; /* Skip non-integers */
total += PyInt_AsLong(item);
}
return total;
}
\end{verbatim}
\ttindex{sum_list()}
\begin{verbatim}
long
sum_sequence(PyObject *sequence)
{
int i, n;
long total = 0;
PyObject *item;
n = PySequence_Length(sequence);
if (n < 0)
return -1; /* Has no length */
for (i = 0; i < n; i++) {
item = PySequence_GetItem(sequence, i);
if (item == NULL)
return -1; /* Not a sequence, or other failure */
if (PyInt_Check(item))
total += PyInt_AsLong(item);
Py_DECREF(item); /* Discard reference ownership */
}
return total;
}
\end{verbatim}
\ttindex{sum_sequence()}
\subsection{Types \label{types}}
There are few other data types that play a significant role in
the Python/C API; most are simple C types such as \ctype{int},
\ctype{long}, \ctype{double} and \ctype{char*}. A few structure types
are used to describe static tables used to list the functions exported
by a module or the data attributes of a new object type, and another
is used to describe the value of a complex number. These will
be discussed together with the functions that use them.
\section{Exceptions \label{exceptions}}
The Python programmer only needs to deal with exceptions if specific
error handling is required; unhandled exceptions are automatically
propagated to the caller, then to the caller's caller, and so on, until
they reach the top-level interpreter, where they are reported to the
user accompanied by a stack traceback.
For C programmers, however, error checking always has to be explicit.
All functions in the Python/C API can raise exceptions, unless an
explicit claim is made otherwise in a function's documentation. In
general, when a function encounters an error, it sets an exception,
discards any object references that it owns, and returns an
error indicator --- usually \NULL{} or \code{-1}. A few functions
return a Boolean true/false result, with false indicating an error.
Very few functions return no explicit error indicator or have an
ambiguous return value, and require explicit testing for errors with
\cfunction{PyErr_Occurred()}\ttindex{PyErr_Occurred()}.
Exception state is maintained in per-thread storage (this is
equivalent to using global storage in an unthreaded application). A
thread can be in one of two states: an exception has occurred, or not.
The function \cfunction{PyErr_Occurred()} can be used to check for
this: it returns a borrowed reference to the exception type object
when an exception has occurred, and \NULL{} otherwise. There are a
number of functions to set the exception state:
\cfunction{PyErr_SetString()}\ttindex{PyErr_SetString()} is the most
common (though not the most general) function to set the exception
state, and \cfunction{PyErr_Clear()}\ttindex{PyErr_Clear()} clears the
exception state.
The full exception state consists of three objects (all of which can
be \NULL): the exception type, the corresponding exception
value, and the traceback. These have the same meanings as the Python
\withsubitem{(in module sys)}{
\ttindex{exc_type}\ttindex{exc_value}\ttindex{exc_traceback}}
objects \code{sys.exc_type}, \code{sys.exc_value}, and
\code{sys.exc_traceback}; however, they are not the same: the Python
objects represent the last exception being handled by a Python
\keyword{try} \ldots\ \keyword{except} statement, while the C level
exception state only exists while an exception is being passed on
between C functions until it reaches the Python bytecode interpreter's
main loop, which takes care of transferring it to \code{sys.exc_type}
and friends.
Note that starting with Python 1.5, the preferred, thread-safe way to
access the exception state from Python code is to call the function
\withsubitem{(in module sys)}{\ttindex{exc_info()}}
\function{sys.exc_info()}, which returns the per-thread exception state
for Python code. Also, the semantics of both ways to access the
exception state have changed so that a function which catches an
exception will save and restore its thread's exception state so as to
preserve the exception state of its caller. This prevents common bugs
in exception handling code caused by an innocent-looking function
overwriting the exception being handled; it also reduces the often
unwanted lifetime extension for objects that are referenced by the
stack frames in the traceback.
As a general principle, a function that calls another function to
perform some task should check whether the called function raised an
exception, and if so, pass the exception state on to its caller. It
should discard any object references that it owns, and return an
error indicator, but it should \emph{not} set another exception ---
that would overwrite the exception that was just raised, and lose
important information about the exact cause of the error.
A simple example of detecting exceptions and passing them on is shown
in the \cfunction{sum_sequence()}\ttindex{sum_sequence()} example
above. It so happens that that example doesn't need to clean up any
owned references when it detects an error. The following example
function shows some error cleanup. First, to remind you why you like
Python, we show the equivalent Python code:
\begin{verbatim}
def incr_item(dict, key):
try:
item = dict[key]
except KeyError:
item = 0
dict[key] = item + 1
\end{verbatim}
\ttindex{incr_item()}
Here is the corresponding C code, in all its glory:
\begin{verbatim}
int
incr_item(PyObject *dict, PyObject *key)
{
/* Objects all initialized to NULL for Py_XDECREF */
PyObject *item = NULL, *const_one = NULL, *incremented_item = NULL;
int rv = -1; /* Return value initialized to -1 (failure) */
item = PyObject_GetItem(dict, key);
if (item == NULL) {
/* Handle KeyError only: */
if (!PyErr_ExceptionMatches(PyExc_KeyError))
goto error;
/* Clear the error and use zero: */
PyErr_Clear();
item = PyInt_FromLong(0L);
if (item == NULL)
goto error;
}
const_one = PyInt_FromLong(1L);
if (const_one == NULL)
goto error;
incremented_item = PyNumber_Add(item, const_one);
if (incremented_item == NULL)
goto error;
if (PyObject_SetItem(dict, key, incremented_item) < 0)
goto error;
rv = 0; /* Success */
/* Continue with cleanup code */
error:
/* Cleanup code, shared by success and failure path */
/* Use Py_XDECREF() to ignore NULL references */
Py_XDECREF(item);
Py_XDECREF(const_one);
Py_XDECREF(incremented_item);
return rv; /* -1 for error, 0 for success */
}
\end{verbatim}
\ttindex{incr_item()}
This example represents an endorsed use of the \keyword{goto} statement
in C! It illustrates the use of
\cfunction{PyErr_ExceptionMatches()}\ttindex{PyErr_ExceptionMatches()} and
\cfunction{PyErr_Clear()}\ttindex{PyErr_Clear()} to
handle specific exceptions, and the use of
\cfunction{Py_XDECREF()}\ttindex{Py_XDECREF()} to
dispose of owned references that may be \NULL{} (note the
\character{X} in the name; \cfunction{Py_DECREF()} would crash when
confronted with a \NULL{} reference). It is important that the
variables used to hold owned references are initialized to \NULL{} for
this to work; likewise, the proposed return value is initialized to
\code{-1} (failure) and only set to success after the final call made
is successful.
\section{Embedding Python \label{embedding}}
The one important task that only embedders (as opposed to extension
writers) of the Python interpreter have to worry about is the
initialization, and possibly the finalization, of the Python
interpreter. Most functionality of the interpreter can only be used
after the interpreter has been initialized.
The basic initialization function is
\cfunction{Py_Initialize()}\ttindex{Py_Initialize()}.
This initializes the table of loaded modules, and creates the
fundamental modules \module{__builtin__}\refbimodindex{__builtin__},
\module{__main__}\refbimodindex{__main__}, \module{sys}\refbimodindex{sys},
and \module{exceptions}.\refbimodindex{exceptions} It also initializes
the module search path (\code{sys.path}).%
\indexiii{module}{search}{path}
\withsubitem{(in module sys)}{\ttindex{path}}
\cfunction{Py_Initialize()} does not set the ``script argument list''
(\code{sys.argv}). If this variable is needed by Python code that
will be executed later, it must be set explicitly with a call to
\code{PySys_SetArgv(\var{argc},
\var{argv})}\ttindex{PySys_SetArgv()} subsequent to the call to
\cfunction{Py_Initialize()}.
On most systems (in particular, on \UNIX{} and Windows, although the
details are slightly different),
\cfunction{Py_Initialize()} calculates the module search path based
upon its best guess for the location of the standard Python
interpreter executable, assuming that the Python library is found in a
fixed location relative to the Python interpreter executable. In
particular, it looks for a directory named
\file{lib/python\shortversion} relative to the parent directory where
the executable named \file{python} is found on the shell command
search path (the environment variable \envvar{PATH}).
For instance, if the Python executable is found in
\file{/usr/local/bin/python}, it will assume that the libraries are in
\file{/usr/local/lib/python\shortversion}. (In fact, this particular path
is also the ``fallback'' location, used when no executable file named
\file{python} is found along \envvar{PATH}.) The user can override
this behavior by setting the environment variable \envvar{PYTHONHOME},
or insert additional directories in front of the standard path by
setting \envvar{PYTHONPATH}.
The embedding application can steer the search by calling
\code{Py_SetProgramName(\var{file})}\ttindex{Py_SetProgramName()} \emph{before} calling
\cfunction{Py_Initialize()}. Note that \envvar{PYTHONHOME} still
overrides this and \envvar{PYTHONPATH} is still inserted in front of
the standard path. An application that requires total control has to
provide its own implementation of
\cfunction{Py_GetPath()}\ttindex{Py_GetPath()},
\cfunction{Py_GetPrefix()}\ttindex{Py_GetPrefix()},
\cfunction{Py_GetExecPrefix()}\ttindex{Py_GetExecPrefix()}, and
\cfunction{Py_GetProgramFullPath()}\ttindex{Py_GetProgramFullPath()} (all
defined in \file{Modules/getpath.c}).
Sometimes, it is desirable to ``uninitialize'' Python. For instance,
the application may want to start over (make another call to
\cfunction{Py_Initialize()}) or the application is simply done with its
use of Python and wants to free memory allocated by Python. This
can be accomplished by calling \cfunction{Py_Finalize()}. The function
\cfunction{Py_IsInitialized()}\ttindex{Py_IsInitialized()} returns
true if Python is currently in the initialized state. More
information about these functions is given in a later chapter.
Notice that \cfunction{Py_Finalize} does \emph{not} free all memory
allocated by the Python interpreter, e.g. memory allocated by extension
modules currently cannot be released.
\section{Debugging Builds \label{debugging}}
Python can be built with several macros to enable extra checks of the
interpreter and extension modules. These checks tend to add a large
amount of overhead to the runtime so they are not enabled by default.
A full list of the various types of debugging builds is in the file
\file{Misc/SpecialBuilds.txt} in the Python source distribution.
Builds are available that support tracing of reference counts,
debugging the memory allocator, or low-level profiling of the main
interpreter loop. Only the most frequently-used builds will be
described in the remainder of this section.
Compiling the interpreter with the \csimplemacro{Py_DEBUG} macro
defined produces what is generally meant by "a debug build" of Python.
\csimplemacro{Py_DEBUG} is enabled in the \UNIX{} build by adding
\longprogramopt{with-pydebug} to the \file{configure} command. It is also
implied by the presence of the not-Python-specific
\csimplemacro{_DEBUG} macro. When \csimplemacro{Py_DEBUG} is enabled
in the \UNIX{} build, compiler optimization is disabled.
In addition to the reference count debugging described below, the
following extra checks are performed:
\begin{itemize}
\item Extra checks are added to the object allocator.
\item Extra checks are added to the parser and compiler.
\item Downcasts from wide types to narrow types are checked for
loss of information.
\item A number of assertions are added to the dictionary and set
implementations. In addition, the set object acquires a
\method{test_c_api} method.
\item Sanity checks of the input arguments are added to frame
creation.
\item The storage for long ints is initialized with a known
invalid pattern to catch reference to uninitialized
digits.
\item Low-level tracing and extra exception checking are added
to the runtime virtual machine.
\item Extra checks are added to the memory arena implementation.
\item Extra debugging is added to the thread module.
\end{itemize}
There may be additional checks not mentioned here.
Defining \csimplemacro{Py_TRACE_REFS} enables reference tracing. When
defined, a circular doubly linked list of active objects is maintained
by adding two extra fields to every \ctype{PyObject}. Total
allocations are tracked as well. Upon exit, all existing references
are printed. (In interactive mode this happens after every statement
run by the interpreter.) Implied by \csimplemacro{Py_DEBUG}.
Please refer to \file{Misc/SpecialBuilds.txt} in the Python source
distribution for more detailed information.

View File

@ -1,204 +0,0 @@
\chapter{Memory Management \label{memory}}
\sectionauthor{Vladimir Marangozov}{Vladimir.Marangozov@inrialpes.fr}
\section{Overview \label{memoryOverview}}
Memory management in Python involves a private heap containing all
Python objects and data structures. The management of this private
heap is ensured internally by the \emph{Python memory manager}. The
Python memory manager has different components which deal with various
dynamic storage management aspects, like sharing, segmentation,
preallocation or caching.
At the lowest level, a raw memory allocator ensures that there is
enough room in the private heap for storing all Python-related data
by interacting with the memory manager of the operating system. On top
of the raw memory allocator, several object-specific allocators
operate on the same heap and implement distinct memory management
policies adapted to the peculiarities of every object type. For
example, integer objects are managed differently within the heap than
strings, tuples or dictionaries because integers imply different
storage requirements and speed/space tradeoffs. The Python memory
manager thus delegates some of the work to the object-specific
allocators, but ensures that the latter operate within the bounds of
the private heap.
It is important to understand that the management of the Python heap
is performed by the interpreter itself and that the user has no
control over it, even if she regularly manipulates object pointers to
memory blocks inside that heap. The allocation of heap space for
Python objects and other internal buffers is performed on demand by
the Python memory manager through the Python/C API functions listed in
this document.
To avoid memory corruption, extension writers should never try to
operate on Python objects with the functions exported by the C
library: \cfunction{malloc()}\ttindex{malloc()},
\cfunction{calloc()}\ttindex{calloc()},
\cfunction{realloc()}\ttindex{realloc()} and
\cfunction{free()}\ttindex{free()}. This will result in
mixed calls between the C allocator and the Python memory manager
with fatal consequences, because they implement different algorithms
and operate on different heaps. However, one may safely allocate and
release memory blocks with the C library allocator for individual
purposes, as shown in the following example:
\begin{verbatim}
PyObject *res;
char *buf = (char *) malloc(BUFSIZ); /* for I/O */
if (buf == NULL)
return PyErr_NoMemory();
...Do some I/O operation involving buf...
res = PyString_FromString(buf);
free(buf); /* malloc'ed */
return res;
\end{verbatim}
In this example, the memory request for the I/O buffer is handled by
the C library allocator. The Python memory manager is involved only
in the allocation of the string object returned as a result.
In most situations, however, it is recommended to allocate memory from
the Python heap specifically because the latter is under control of
the Python memory manager. For example, this is required when the
interpreter is extended with new object types written in C. Another
reason for using the Python heap is the desire to \emph{inform} the
Python memory manager about the memory needs of the extension module.
Even when the requested memory is used exclusively for internal,
highly-specific purposes, delegating all memory requests to the Python
memory manager causes the interpreter to have a more accurate image of
its memory footprint as a whole. Consequently, under certain
circumstances, the Python memory manager may or may not trigger
appropriate actions, like garbage collection, memory compaction or
other preventive procedures. Note that by using the C library
allocator as shown in the previous example, the allocated memory for
the I/O buffer escapes completely the Python memory manager.
\section{Memory Interface \label{memoryInterface}}
The following function sets, modeled after the ANSI C standard,
but specifying behavior when requesting zero bytes,
are available for allocating and releasing memory from the Python heap:
\begin{cfuncdesc}{void*}{PyMem_Malloc}{size_t n}
Allocates \var{n} bytes and returns a pointer of type \ctype{void*}
to the allocated memory, or \NULL{} if the request fails.
Requesting zero bytes returns a distinct non-\NULL{} pointer if
possible, as if \cfunction{PyMem_Malloc(1)} had been called instead.
The memory will not have been initialized in any way.
\end{cfuncdesc}
\begin{cfuncdesc}{void*}{PyMem_Realloc}{void *p, size_t n}
Resizes the memory block pointed to by \var{p} to \var{n} bytes.
The contents will be unchanged to the minimum of the old and the new
sizes. If \var{p} is \NULL, the call is equivalent to
\cfunction{PyMem_Malloc(\var{n})}; else if \var{n} is equal to zero, the
memory block is resized but is not freed, and the returned pointer
is non-\NULL. Unless \var{p} is \NULL, it must have been
returned by a previous call to \cfunction{PyMem_Malloc()} or
\cfunction{PyMem_Realloc()}. If the request fails,
\cfunction{PyMem_Realloc()} returns \NULL{} and \var{p} remains a
valid pointer to the previous memory area.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PyMem_Free}{void *p}
Frees the memory block pointed to by \var{p}, which must have been
returned by a previous call to \cfunction{PyMem_Malloc()} or
\cfunction{PyMem_Realloc()}. Otherwise, or if
\cfunction{PyMem_Free(p)} has been called before, undefined
behavior occurs. If \var{p} is \NULL, no operation is performed.
\end{cfuncdesc}
The following type-oriented macros are provided for convenience. Note
that \var{TYPE} refers to any C type.
\begin{cfuncdesc}{\var{TYPE}*}{PyMem_New}{TYPE, size_t n}
Same as \cfunction{PyMem_Malloc()}, but allocates \code{(\var{n} *
sizeof(\var{TYPE}))} bytes of memory. Returns a pointer cast to
\ctype{\var{TYPE}*}. The memory will not have been initialized in
any way.
\end{cfuncdesc}
\begin{cfuncdesc}{\var{TYPE}*}{PyMem_Resize}{void *p, TYPE, size_t n}
Same as \cfunction{PyMem_Realloc()}, but the memory block is resized
to \code{(\var{n} * sizeof(\var{TYPE}))} bytes. Returns a pointer
cast to \ctype{\var{TYPE}*}. On return, \var{p} will be a pointer to
the new memory area, or \NULL{} in the event of failure.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{PyMem_Del}{void *p}
Same as \cfunction{PyMem_Free()}.
\end{cfuncdesc}
In addition, the following macro sets are provided for calling the
Python memory allocator directly, without involving the C API functions
listed above. However, note that their use does not preserve binary
compatibility across Python versions and is therefore deprecated in
extension modules.
\cfunction{PyMem_MALLOC()}, \cfunction{PyMem_REALLOC()}, \cfunction{PyMem_FREE()}.
\cfunction{PyMem_NEW()}, \cfunction{PyMem_RESIZE()}, \cfunction{PyMem_DEL()}.
\section{Examples \label{memoryExamples}}
Here is the example from section \ref{memoryOverview}, rewritten so
that the I/O buffer is allocated from the Python heap by using the
first function set:
\begin{verbatim}
PyObject *res;
char *buf = (char *) PyMem_Malloc(BUFSIZ); /* for I/O */
if (buf == NULL)
return PyErr_NoMemory();
/* ...Do some I/O operation involving buf... */
res = PyString_FromString(buf);
PyMem_Free(buf); /* allocated with PyMem_Malloc */
return res;
\end{verbatim}
The same code using the type-oriented function set:
\begin{verbatim}
PyObject *res;
char *buf = PyMem_New(char, BUFSIZ); /* for I/O */
if (buf == NULL)
return PyErr_NoMemory();
/* ...Do some I/O operation involving buf... */
res = PyString_FromString(buf);
PyMem_Del(buf); /* allocated with PyMem_New */
return res;
\end{verbatim}
Note that in the two examples above, the buffer is always
manipulated via functions belonging to the same set. Indeed, it
is required to use the same memory API family for a given
memory block, so that the risk of mixing different allocators is
reduced to a minimum. The following code sequence contains two errors,
one of which is labeled as \emph{fatal} because it mixes two different
allocators operating on different heaps.
\begin{verbatim}
char *buf1 = PyMem_New(char, BUFSIZ);
char *buf2 = (char *) malloc(BUFSIZ);
char *buf3 = (char *) PyMem_Malloc(BUFSIZ);
...
PyMem_Del(buf3); /* Wrong -- should be PyMem_Free() */
free(buf2); /* Right -- allocated via malloc() */
free(buf1); /* Fatal -- should be PyMem_Del() */
\end{verbatim}
In addition to the functions aimed at handling raw memory blocks from
the Python heap, objects in Python are allocated and released with
\cfunction{PyObject_New()}, \cfunction{PyObject_NewVar()} and
\cfunction{PyObject_Del()}.
These will be explained in the next chapter on defining and
implementing new object types in C.

File diff suppressed because it is too large Load Diff

View File

@ -1,69 +0,0 @@
\chapter{Reference Counting \label{countingRefs}}
The macros in this section are used for managing reference counts
of Python objects.
\begin{cfuncdesc}{void}{Py_INCREF}{PyObject *o}
Increment the reference count for object \var{o}. The object must
not be \NULL; if you aren't sure that it isn't \NULL, use
\cfunction{Py_XINCREF()}.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{Py_XINCREF}{PyObject *o}
Increment the reference count for object \var{o}. The object may be
\NULL, in which case the macro has no effect.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{Py_DECREF}{PyObject *o}
Decrement the reference count for object \var{o}. The object must
not be \NULL; if you aren't sure that it isn't \NULL, use
\cfunction{Py_XDECREF()}. If the reference count reaches zero, the
object's type's deallocation function (which must not be \NULL) is
invoked.
\warning{The deallocation function can cause arbitrary Python code
to be invoked (e.g. when a class instance with a \method{__del__()}
method is deallocated). While exceptions in such code are not
propagated, the executed code has free access to all Python global
variables. This means that any object that is reachable from a
global variable should be in a consistent state before
\cfunction{Py_DECREF()} is invoked. For example, code to delete an
object from a list should copy a reference to the deleted object in
a temporary variable, update the list data structure, and then call
\cfunction{Py_DECREF()} for the temporary variable.}
\end{cfuncdesc}
\begin{cfuncdesc}{void}{Py_XDECREF}{PyObject *o}
Decrement the reference count for object \var{o}. The object may be
\NULL, in which case the macro has no effect; otherwise the effect
is the same as for \cfunction{Py_DECREF()}, and the same warning
applies.
\end{cfuncdesc}
\begin{cfuncdesc}{void}{Py_CLEAR}{PyObject *o}
Decrement the reference count for object \var{o}. The object may be
\NULL, in which case the macro has no effect; otherwise the effect
is the same as for \cfunction{Py_DECREF()}, except that the argument
is also set to \NULL. The warning for \cfunction{Py_DECREF()} does
not apply with respect to the object passed because the macro
carefully uses a temporary variable and sets the argument to \NULL
before decrementing its reference count.
It is a good idea to use this macro whenever decrementing the value
of a variable that might be traversed during garbage collection.
\versionadded{2.4}
\end{cfuncdesc}
The following functions are for runtime dynamic embedding of Python:
\cfunction{Py_IncRef(PyObject *o)}, \cfunction{Py_DecRef(PyObject *o)}.
They are simply exported function versions of \cfunction{Py_XINCREF()} and
\cfunction{Py_XDECREF()}, respectively.
The following functions or macros are only for use within the
interpreter core: \cfunction{_Py_Dealloc()},
\cfunction{_Py_ForgetReference()}, \cfunction{_Py_NewReference()}, as
well as the global variable \cdata{_Py_RefTotal}.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,287 +0,0 @@
\chapter{The Very High Level Layer \label{veryhigh}}
The functions in this chapter will let you execute Python source code
given in a file or a buffer, but they will not let you interact in a
more detailed way with the interpreter.
Several of these functions accept a start symbol from the grammar as a
parameter. The available start symbols are \constant{Py_eval_input},
\constant{Py_file_input}, and \constant{Py_single_input}. These are
described following the functions which accept them as parameters.
Note also that several of these functions take \ctype{FILE*}
parameters. On particular issue which needs to be handled carefully
is that the \ctype{FILE} structure for different C libraries can be
different and incompatible. Under Windows (at least), it is possible
for dynamically linked extensions to actually use different libraries,
so care should be taken that \ctype{FILE*} parameters are only passed
to these functions if it is certain that they were created by the same
library that the Python runtime is using.
\begin{cfuncdesc}{int}{Py_Main}{int argc, char **argv}
The main program for the standard interpreter. This is made
available for programs which embed Python. The \var{argc} and
\var{argv} parameters should be prepared exactly as those which are
passed to a C program's \cfunction{main()} function. It is
important to note that the argument list may be modified (but the
contents of the strings pointed to by the argument list are not).
The return value will be the integer passed to the
\function{sys.exit()} function, \code{1} if the interpreter exits
due to an exception, or \code{2} if the parameter list does not
represent a valid Python command line.
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PyRun_AnyFile}{FILE *fp, const char *filename}
This is a simplified interface to \cfunction{PyRun_AnyFileExFlags()}
below, leaving \var{closeit} set to \code{0} and \var{flags} set to \NULL.
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PyRun_AnyFileFlags}{FILE *fp, const char *filename,
PyCompilerFlags *flags}
This is a simplified interface to \cfunction{PyRun_AnyFileExFlags()}
below, leaving the \var{closeit} argument set to \code{0}.
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PyRun_AnyFileEx}{FILE *fp, const char *filename,
int closeit}
This is a simplified interface to \cfunction{PyRun_AnyFileExFlags()}
below, leaving the \var{flags} argument set to \NULL.
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PyRun_AnyFileExFlags}{FILE *fp, const char *filename,
int closeit,
PyCompilerFlags *flags}
If \var{fp} refers to a file associated with an interactive device
(console or terminal input or \UNIX{} pseudo-terminal), return the
value of \cfunction{PyRun_InteractiveLoop()}, otherwise return the
result of \cfunction{PyRun_SimpleFile()}. If \var{filename} is
\NULL, this function uses \code{"???"} as the filename.
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PyRun_SimpleString}{const char *command}
This is a simplified interface to \cfunction{PyRun_SimpleStringFlags()}
below, leaving the \var{PyCompilerFlags*} argument set to NULL.
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PyRun_SimpleStringFlags}{const char *command,
PyCompilerFlags *flags}
Executes the Python source code from \var{command} in the
\module{__main__} module according to the \var{flags} argument.
If \module{__main__} does not already exist, it is created. Returns
\code{0} on success or \code{-1} if an exception was raised. If there
was an error, there is no way to get the exception information.
For the meaning of \var{flags}, see below.
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PyRun_SimpleFile}{FILE *fp, const char *filename}
This is a simplified interface to \cfunction{PyRun_SimpleFileExFlags()}
below, leaving \var{closeit} set to \code{0} and \var{flags} set to
\NULL.
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PyRun_SimpleFileFlags}{FILE *fp, const char *filename,
PyCompilerFlags *flags}
This is a simplified interface to \cfunction{PyRun_SimpleFileExFlags()}
below, leaving \var{closeit} set to \code{0}.
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PyRun_SimpleFileEx}{FILE *fp, const char *filename,
int closeit}
This is a simplified interface to \cfunction{PyRun_SimpleFileExFlags()}
below, leaving \var{flags} set to \NULL.
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PyRun_SimpleFileExFlags}{FILE *fp, const char *filename,
int closeit,
PyCompilerFlags *flags}
Similar to \cfunction{PyRun_SimpleStringFlags()}, but the Python source
code is read from \var{fp} instead of an in-memory string.
\var{filename} should be the name of the file. If \var{closeit} is
true, the file is closed before PyRun_SimpleFileExFlags returns.
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PyRun_InteractiveOne}{FILE *fp, const char *filename}
This is a simplified interface to \cfunction{PyRun_InteractiveOneFlags()}
below, leaving \var{flags} set to \NULL.
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PyRun_InteractiveOneFlags}{FILE *fp,
const char *filename,
PyCompilerFlags *flags}
Read and execute a single statement from a file associated with an
interactive device according to the \var{flags} argument. If
\var{filename} is \NULL, \code{"???"} is used instead. The user will
be prompted using \code{sys.ps1} and \code{sys.ps2}. Returns \code{0}
when the input was executed successfully, \code{-1} if there was an
exception, or an error code from the \file{errcode.h} include file
distributed as part of Python if there was a parse error. (Note that
\file{errcode.h} is not included by \file{Python.h}, so must be included
specifically if needed.)
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PyRun_InteractiveLoop}{FILE *fp, const char *filename}
This is a simplified interface to \cfunction{PyRun_InteractiveLoopFlags()}
below, leaving \var{flags} set to \NULL.
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PyRun_InteractiveLoopFlags}{FILE *fp,
const char *filename,
PyCompilerFlags *flags}
Read and execute statements from a file associated with an
interactive device until \EOF{} is reached. If \var{filename} is
\NULL, \code{"???"} is used instead. The user will be prompted
using \code{sys.ps1} and \code{sys.ps2}. Returns \code{0} at \EOF.
\end{cfuncdesc}
\begin{cfuncdesc}{struct _node*}{PyParser_SimpleParseString}{const char *str,
int start}
This is a simplified interface to
\cfunction{PyParser_SimpleParseStringFlagsFilename()} below, leaving
\var{filename} set to \NULL{} and \var{flags} set to \code{0}.
\end{cfuncdesc}
\begin{cfuncdesc}{struct _node*}{PyParser_SimpleParseStringFlags}{
const char *str, int start, int flags}
This is a simplified interface to
\cfunction{PyParser_SimpleParseStringFlagsFilename()} below, leaving
\var{filename} set to \NULL.
\end{cfuncdesc}
\begin{cfuncdesc}{struct _node*}{PyParser_SimpleParseStringFlagsFilename}{
const char *str, const char *filename,
int start, int flags}
Parse Python source code from \var{str} using the start token
\var{start} according to the \var{flags} argument. The result can
be used to create a code object which can be evaluated efficiently.
This is useful if a code fragment must be evaluated many times.
\end{cfuncdesc}
\begin{cfuncdesc}{struct _node*}{PyParser_SimpleParseFile}{FILE *fp,
const char *filename, int start}
This is a simplified interface to \cfunction{PyParser_SimpleParseFileFlags()}
below, leaving \var{flags} set to \code{0}
\end{cfuncdesc}
\begin{cfuncdesc}{struct _node*}{PyParser_SimpleParseFileFlags}{FILE *fp,
const char *filename, int start, int flags}
Similar to \cfunction{PyParser_SimpleParseStringFlagsFilename()}, but
the Python source code is read from \var{fp} instead of an in-memory
string.
\end{cfuncdesc}
\begin{cfuncdesc}{PyObject*}{PyRun_String}{const char *str, int start,
PyObject *globals,
PyObject *locals}
This is a simplified interface to \cfunction{PyRun_StringFlags()} below,
leaving \var{flags} set to \NULL.
\end{cfuncdesc}
\begin{cfuncdesc}{PyObject*}{PyRun_StringFlags}{const char *str, int start,
PyObject *globals,
PyObject *locals,
PyCompilerFlags *flags}
Execute Python source code from \var{str} in the context specified
by the dictionaries \var{globals} and \var{locals} with the compiler
flags specified by \var{flags}. The parameter \var{start} specifies
the start token that should be used to parse the source code.
Returns the result of executing the code as a Python object, or
\NULL{} if an exception was raised.
\end{cfuncdesc}
\begin{cfuncdesc}{PyObject*}{PyRun_File}{FILE *fp, const char *filename,
int start, PyObject *globals,
PyObject *locals}
This is a simplified interface to \cfunction{PyRun_FileExFlags()} below,
leaving \var{closeit} set to \code{0} and \var{flags} set to \NULL.
\end{cfuncdesc}
\begin{cfuncdesc}{PyObject*}{PyRun_FileEx}{FILE *fp, const char *filename,
int start, PyObject *globals,
PyObject *locals, int closeit}
This is a simplified interface to \cfunction{PyRun_FileExFlags()} below,
leaving \var{flags} set to \NULL.
\end{cfuncdesc}
\begin{cfuncdesc}{PyObject*}{PyRun_FileFlags}{FILE *fp, const char *filename,
int start, PyObject *globals,
PyObject *locals,
PyCompilerFlags *flags}
This is a simplified interface to \cfunction{PyRun_FileExFlags()} below,
leaving \var{closeit} set to \code{0}.
\end{cfuncdesc}
\begin{cfuncdesc}{PyObject*}{PyRun_FileExFlags}{FILE *fp, const char *filename,
int start, PyObject *globals,
PyObject *locals, int closeit,
PyCompilerFlags *flags}
Similar to \cfunction{PyRun_StringFlags()}, but the Python source code is
read from \var{fp} instead of an in-memory string.
\var{filename} should be the name of the file.
If \var{closeit} is true, the file is closed before
\cfunction{PyRun_FileExFlags()} returns.
\end{cfuncdesc}
\begin{cfuncdesc}{PyObject*}{Py_CompileString}{const char *str,
const char *filename,
int start}
This is a simplified interface to \cfunction{Py_CompileStringFlags()} below,
leaving \var{flags} set to \NULL.
\end{cfuncdesc}
\begin{cfuncdesc}{PyObject*}{Py_CompileStringFlags}{const char *str,
const char *filename,
int start,
PyCompilerFlags *flags}
Parse and compile the Python source code in \var{str}, returning the
resulting code object. The start token is given by \var{start};
this can be used to constrain the code which can be compiled and should
be \constant{Py_eval_input}, \constant{Py_file_input}, or
\constant{Py_single_input}. The filename specified by
\var{filename} is used to construct the code object and may appear
in tracebacks or \exception{SyntaxError} exception messages. This
returns \NULL{} if the code cannot be parsed or compiled.
\end{cfuncdesc}
\begin{cvardesc}{int}{Py_eval_input}
The start symbol from the Python grammar for isolated expressions;
for use with
\cfunction{Py_CompileString()}\ttindex{Py_CompileString()}.
\end{cvardesc}
\begin{cvardesc}{int}{Py_file_input}
The start symbol from the Python grammar for sequences of statements
as read from a file or other source; for use with
\cfunction{Py_CompileString()}\ttindex{Py_CompileString()}. This is
the symbol to use when compiling arbitrarily long Python source code.
\end{cvardesc}
\begin{cvardesc}{int}{Py_single_input}
The start symbol from the Python grammar for a single statement; for
use with \cfunction{Py_CompileString()}\ttindex{Py_CompileString()}.
This is the symbol used for the interactive interpreter loop.
\end{cvardesc}
\begin{ctypedesc}[PyCompilerFlags]{struct PyCompilerFlags}
This is the structure used to hold compiler flags. In cases where
code is only being compiled, it is passed as \code{int flags}, and in
cases where code is being executed, it is passed as
\code{PyCompilerFlags *flags}. In this case, \code{from __future__
import} can modify \var{flags}.
Whenever \code{PyCompilerFlags *flags} is \NULL, \member{cf_flags}
is treated as equal to \code{0}, and any modification due to
\code{from __future__ import} is discarded.
\begin{verbatim}
struct PyCompilerFlags {
int cf_flags;
}
\end{verbatim}
\end{ctypedesc}
\begin{cvardesc}{int}{CO_FUTURE_DIVISION}
This bit can be set in \var{flags} to cause division operator \code{/}
to be interpreted as ``true division'' according to \pep{238}.
\end{cvardesc}

View File

@ -1,9 +0,0 @@
\author{Guido van Rossum\\
Fred L. Drake, Jr., editor}
\authoraddress{
\strong{Python Software Foundation}\\
Email: \email{docs@python.org}
}
\date{\today} % XXX update before final release!
\input{patchlevel} % include Python version information

View File

@ -1,14 +0,0 @@
Copyright \copyright{} 2001-2007 Python Software Foundation.
All rights reserved.
Copyright \copyright{} 2000 BeOpen.com.
All rights reserved.
Copyright \copyright{} 1995-2000 Corporation for National Research Initiatives.
All rights reserved.
Copyright \copyright{} 1991-1995 Stichting Mathematisch Centrum.
All rights reserved.
See the end of this document for complete license and permissions
information.

View File

@ -1,674 +0,0 @@
\section{History of the software}
Python was created in the early 1990s by Guido van Rossum at Stichting
Mathematisch Centrum (CWI, see \url{http://www.cwi.nl/}) in the Netherlands
as a successor of a language called ABC. Guido remains Python's
principal author, although it includes many contributions from others.
In 1995, Guido continued his work on Python at the Corporation for
National Research Initiatives (CNRI, see \url{http://www.cnri.reston.va.us/})
in Reston, Virginia where he released several versions of the
software.
In May 2000, Guido and the Python core development team moved to
BeOpen.com to form the BeOpen PythonLabs team. In October of the same
year, the PythonLabs team moved to Digital Creations (now Zope
Corporation; see \url{http://www.zope.com/}). In 2001, the Python
Software Foundation (PSF, see \url{http://www.python.org/psf/}) was
formed, a non-profit organization created specifically to own
Python-related Intellectual Property. Zope Corporation is a
sponsoring member of the PSF.
All Python releases are Open Source (see
\url{http://www.opensource.org/} for the Open Source Definition).
Historically, most, but not all, Python releases have also been
GPL-compatible; the table below summarizes the various releases.
\begin{tablev}{c|c|c|c|c}{textrm}%
{Release}{Derived from}{Year}{Owner}{GPL compatible?}
\linev{0.9.0 thru 1.2}{n/a}{1991-1995}{CWI}{yes}
\linev{1.3 thru 1.5.2}{1.2}{1995-1999}{CNRI}{yes}
\linev{1.6}{1.5.2}{2000}{CNRI}{no}
\linev{2.0}{1.6}{2000}{BeOpen.com}{no}
\linev{1.6.1}{1.6}{2001}{CNRI}{no}
\linev{2.1}{2.0+1.6.1}{2001}{PSF}{no}
\linev{2.0.1}{2.0+1.6.1}{2001}{PSF}{yes}
\linev{2.1.1}{2.1+2.0.1}{2001}{PSF}{yes}
\linev{2.2}{2.1.1}{2001}{PSF}{yes}
\linev{2.1.2}{2.1.1}{2002}{PSF}{yes}
\linev{2.1.3}{2.1.2}{2002}{PSF}{yes}
\linev{2.2.1}{2.2}{2002}{PSF}{yes}
\linev{2.2.2}{2.2.1}{2002}{PSF}{yes}
\linev{2.2.3}{2.2.2}{2002-2003}{PSF}{yes}
\linev{2.3}{2.2.2}{2002-2003}{PSF}{yes}
\linev{2.3.1}{2.3}{2002-2003}{PSF}{yes}
\linev{2.3.2}{2.3.1}{2003}{PSF}{yes}
\linev{2.3.3}{2.3.2}{2003}{PSF}{yes}
\linev{2.3.4}{2.3.3}{2004}{PSF}{yes}
\linev{2.3.5}{2.3.4}{2005}{PSF}{yes}
\linev{2.4}{2.3}{2004}{PSF}{yes}
\linev{2.4.1}{2.4}{2005}{PSF}{yes}
\linev{2.4.2}{2.4.1}{2005}{PSF}{yes}
\linev{2.4.3}{2.4.2}{2006}{PSF}{yes}
\linev{2.4.4}{2.4.3}{2006}{PSF}{yes}
\linev{2.5}{2.4}{2006}{PSF}{yes}
\linev{2.5.1}{2.5}{2007}{PSF}{yes}
\end{tablev}
\note{GPL-compatible doesn't mean that we're distributing
Python under the GPL. All Python licenses, unlike the GPL, let you
distribute a modified version without making your changes open source.
The GPL-compatible licenses make it possible to combine Python with
other software that is released under the GPL; the others don't.}
Thanks to the many outside volunteers who have worked under Guido's
direction to make these releases possible.
\section{Terms and conditions for accessing or otherwise using Python}
\centerline{\strong{PSF LICENSE AGREEMENT FOR PYTHON \version}}
\begin{enumerate}
\item
This LICENSE AGREEMENT is between the Python Software Foundation
(``PSF''), and the Individual or Organization (``Licensee'') accessing
and otherwise using Python \version{} software in source or binary
form and its associated documentation.
\item
Subject to the terms and conditions of this License Agreement, PSF
hereby grants Licensee a nonexclusive, royalty-free, world-wide
license to reproduce, analyze, test, perform and/or display publicly,
prepare derivative works, distribute, and otherwise use Python
\version{} alone or in any derivative version, provided, however, that
PSF's License Agreement and PSF's notice of copyright, i.e.,
``Copyright \copyright{} 2001-2007 Python Software Foundation; All
Rights Reserved'' are retained in Python \version{} alone or in any
derivative version prepared by Licensee.
\item
In the event Licensee prepares a derivative work that is based on
or incorporates Python \version{} or any part thereof, and wants to
make the derivative work available to others as provided herein, then
Licensee hereby agrees to include in any such work a brief summary of
the changes made to Python \version.
\item
PSF is making Python \version{} available to Licensee on an ``AS IS''
basis. PSF MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PSF MAKES NO AND
DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON \version{} WILL
NOT INFRINGE ANY THIRD PARTY RIGHTS.
\item
PSF SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON
\version{} FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR
LOSS AS A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON
\version, OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE
POSSIBILITY THEREOF.
\item
This License Agreement will automatically terminate upon a material
breach of its terms and conditions.
\item
Nothing in this License Agreement shall be deemed to create any
relationship of agency, partnership, or joint venture between PSF and
Licensee. This License Agreement does not grant permission to use PSF
trademarks or trade name in a trademark sense to endorse or promote
products or services of Licensee, or any third party.
\item
By copying, installing or otherwise using Python \version, Licensee
agrees to be bound by the terms and conditions of this License
Agreement.
\end{enumerate}
\centerline{\strong{BEOPEN.COM LICENSE AGREEMENT FOR PYTHON 2.0}}
\centerline{\strong{BEOPEN PYTHON OPEN SOURCE LICENSE AGREEMENT VERSION 1}}
\begin{enumerate}
\item
This LICENSE AGREEMENT is between BeOpen.com (``BeOpen''), having an
office at 160 Saratoga Avenue, Santa Clara, CA 95051, and the
Individual or Organization (``Licensee'') accessing and otherwise
using this software in source or binary form and its associated
documentation (``the Software'').
\item
Subject to the terms and conditions of this BeOpen Python License
Agreement, BeOpen hereby grants Licensee a non-exclusive,
royalty-free, world-wide license to reproduce, analyze, test, perform
and/or display publicly, prepare derivative works, distribute, and
otherwise use the Software alone or in any derivative version,
provided, however, that the BeOpen Python License is retained in the
Software, alone or in any derivative version prepared by Licensee.
\item
BeOpen is making the Software available to Licensee on an ``AS IS''
basis. BEOPEN MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, BEOPEN MAKES NO AND
DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE SOFTWARE WILL NOT
INFRINGE ANY THIRD PARTY RIGHTS.
\item
BEOPEN SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF THE
SOFTWARE FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS
AS A RESULT OF USING, MODIFYING OR DISTRIBUTING THE SOFTWARE, OR ANY
DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.
\item
This License Agreement will automatically terminate upon a material
breach of its terms and conditions.
\item
This License Agreement shall be governed by and interpreted in all
respects by the law of the State of California, excluding conflict of
law provisions. Nothing in this License Agreement shall be deemed to
create any relationship of agency, partnership, or joint venture
between BeOpen and Licensee. This License Agreement does not grant
permission to use BeOpen trademarks or trade names in a trademark
sense to endorse or promote products or services of Licensee, or any
third party. As an exception, the ``BeOpen Python'' logos available
at http://www.pythonlabs.com/logos.html may be used according to the
permissions granted on that web page.
\item
By copying, installing or otherwise using the software, Licensee
agrees to be bound by the terms and conditions of this License
Agreement.
\end{enumerate}
\centerline{\strong{CNRI LICENSE AGREEMENT FOR PYTHON 1.6.1}}
\begin{enumerate}
\item
This LICENSE AGREEMENT is between the Corporation for National
Research Initiatives, having an office at 1895 Preston White Drive,
Reston, VA 20191 (``CNRI''), and the Individual or Organization
(``Licensee'') accessing and otherwise using Python 1.6.1 software in
source or binary form and its associated documentation.
\item
Subject to the terms and conditions of this License Agreement, CNRI
hereby grants Licensee a nonexclusive, royalty-free, world-wide
license to reproduce, analyze, test, perform and/or display publicly,
prepare derivative works, distribute, and otherwise use Python 1.6.1
alone or in any derivative version, provided, however, that CNRI's
License Agreement and CNRI's notice of copyright, i.e., ``Copyright
\copyright{} 1995-2001 Corporation for National Research Initiatives;
All Rights Reserved'' are retained in Python 1.6.1 alone or in any
derivative version prepared by Licensee. Alternately, in lieu of
CNRI's License Agreement, Licensee may substitute the following text
(omitting the quotes): ``Python 1.6.1 is made available subject to the
terms and conditions in CNRI's License Agreement. This Agreement
together with Python 1.6.1 may be located on the Internet using the
following unique, persistent identifier (known as a handle):
1895.22/1013. This Agreement may also be obtained from a proxy server
on the Internet using the following URL:
\url{http://hdl.handle.net/1895.22/1013}.''
\item
In the event Licensee prepares a derivative work that is based on
or incorporates Python 1.6.1 or any part thereof, and wants to make
the derivative work available to others as provided herein, then
Licensee hereby agrees to include in any such work a brief summary of
the changes made to Python 1.6.1.
\item
CNRI is making Python 1.6.1 available to Licensee on an ``AS IS''
basis. CNRI MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, CNRI MAKES NO AND
DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON 1.6.1 WILL NOT
INFRINGE ANY THIRD PARTY RIGHTS.
\item
CNRI SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON
1.6.1 FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS
A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON 1.6.1,
OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.
\item
This License Agreement will automatically terminate upon a material
breach of its terms and conditions.
\item
This License Agreement shall be governed by the federal
intellectual property law of the United States, including without
limitation the federal copyright law, and, to the extent such
U.S. federal law does not apply, by the law of the Commonwealth of
Virginia, excluding Virginia's conflict of law provisions.
Notwithstanding the foregoing, with regard to derivative works based
on Python 1.6.1 that incorporate non-separable material that was
previously distributed under the GNU General Public License (GPL), the
law of the Commonwealth of Virginia shall govern this License
Agreement only as to issues arising under or with respect to
Paragraphs 4, 5, and 7 of this License Agreement. Nothing in this
License Agreement shall be deemed to create any relationship of
agency, partnership, or joint venture between CNRI and Licensee. This
License Agreement does not grant permission to use CNRI trademarks or
trade name in a trademark sense to endorse or promote products or
services of Licensee, or any third party.
\item
By clicking on the ``ACCEPT'' button where indicated, or by copying,
installing or otherwise using Python 1.6.1, Licensee agrees to be
bound by the terms and conditions of this License Agreement.
\end{enumerate}
\centerline{ACCEPT}
\centerline{\strong{CWI LICENSE AGREEMENT FOR PYTHON 0.9.0 THROUGH 1.2}}
Copyright \copyright{} 1991 - 1995, Stichting Mathematisch Centrum
Amsterdam, The Netherlands. All rights reserved.
Permission to use, copy, modify, and distribute this software and its
documentation for any purpose and without fee is hereby granted,
provided that the above copyright notice appear in all copies and that
both that copyright notice and this permission notice appear in
supporting documentation, and that the name of Stichting Mathematisch
Centrum or CWI not be used in advertising or publicity pertaining to
distribution of the software without specific, written prior
permission.
STICHTING MATHEMATISCH CENTRUM DISCLAIMS ALL WARRANTIES WITH REGARD TO
THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS, IN NO EVENT SHALL STICHTING MATHEMATISCH CENTRUM BE LIABLE
FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
\section{Licenses and Acknowledgements for Incorporated Software}
This section is an incomplete, but growing list of licenses and
acknowledgements for third-party software incorporated in the
Python distribution.
\subsection{Mersenne Twister}
The \module{_random} module includes code based on a download from
\url{http://www.math.keio.ac.jp/~matumoto/MT2002/emt19937ar.html}.
The following are the verbatim comments from the original code:
\begin{verbatim}
A C-program for MT19937, with initialization improved 2002/1/26.
Coded by Takuji Nishimura and Makoto Matsumoto.
Before using, initialize the state by using init_genrand(seed)
or init_by_array(init_key, key_length).
Copyright (C) 1997 - 2002, Makoto Matsumoto and Takuji Nishimura,
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. The names of its contributors may not be used to endorse or promote
products derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Any feedback is very welcome.
http://www.math.keio.ac.jp/matumoto/emt.html
email: matumoto@math.keio.ac.jp
\end{verbatim}
\subsection{Sockets}
The \module{socket} module uses the functions, \function{getaddrinfo},
and \function{getnameinfo}, which are coded in separate source files
from the WIDE Project, \url{http://www.wide.ad.jp/about/index.html}.
\begin{verbatim}
Copyright (C) 1995, 1996, 1997, and 1998 WIDE Project.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. Neither the name of the project nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND
GAI_ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE
FOR GAI_ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON GAI_ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN GAI_ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
\end{verbatim}
\subsection{Floating point exception control}
The source for the \module{fpectl} module includes the following notice:
\begin{verbatim}
---------------------------------------------------------------------
/ Copyright (c) 1996. \
| The Regents of the University of California. |
| All rights reserved. |
| |
| Permission to use, copy, modify, and distribute this software for |
| any purpose without fee is hereby granted, provided that this en- |
| tire notice is included in all copies of any software which is or |
| includes a copy or modification of this software and in all |
| copies of the supporting documentation for such software. |
| |
| This work was produced at the University of California, Lawrence |
| Livermore National Laboratory under contract no. W-7405-ENG-48 |
| between the U.S. Department of Energy and The Regents of the |
| University of California for the operation of UC LLNL. |
| |
| DISCLAIMER |
| |
| This software was prepared as an account of work sponsored by an |
| agency of the United States Government. Neither the United States |
| Government nor the University of California nor any of their em- |
| ployees, makes any warranty, express or implied, or assumes any |
| liability or responsibility for the accuracy, completeness, or |
| usefulness of any information, apparatus, product, or process |
| disclosed, or represents that its use would not infringe |
| privately-owned rights. Reference herein to any specific commer- |
| cial products, process, or service by trade name, trademark, |
| manufacturer, or otherwise, does not necessarily constitute or |
| imply its endorsement, recommendation, or favoring by the United |
| States Government or the University of California. The views and |
| opinions of authors expressed herein do not necessarily state or |
| reflect those of the United States Government or the University |
| of California, and shall not be used for advertising or product |
\ endorsement purposes. /
---------------------------------------------------------------------
\end{verbatim}
\subsection{MD5 message digest algorithm}
The source code for the \module{md5} module contains the following notice:
\begin{verbatim}
Copyright (C) 1999, 2002 Aladdin Enterprises. All rights reserved.
This software is provided 'as-is', without any express or implied
warranty. In no event will the authors be held liable for any damages
arising from the use of this software.
Permission is granted to anyone to use this software for any purpose,
including commercial applications, and to alter it and redistribute it
freely, subject to the following restrictions:
1. The origin of this software must not be misrepresented; you must not
claim that you wrote the original software. If you use this software
in a product, an acknowledgment in the product documentation would be
appreciated but is not required.
2. Altered source versions must be plainly marked as such, and must not be
misrepresented as being the original software.
3. This notice may not be removed or altered from any source distribution.
L. Peter Deutsch
ghost@aladdin.com
Independent implementation of MD5 (RFC 1321).
This code implements the MD5 Algorithm defined in RFC 1321, whose
text is available at
http://www.ietf.org/rfc/rfc1321.txt
The code is derived from the text of the RFC, including the test suite
(section A.5) but excluding the rest of Appendix A. It does not include
any code or documentation that is identified in the RFC as being
copyrighted.
The original and principal author of md5.h is L. Peter Deutsch
<ghost@aladdin.com>. Other authors are noted in the change history
that follows (in reverse chronological order):
2002-04-13 lpd Removed support for non-ANSI compilers; removed
references to Ghostscript; clarified derivation from RFC 1321;
now handles byte order either statically or dynamically.
1999-11-04 lpd Edited comments slightly for automatic TOC extraction.
1999-10-18 lpd Fixed typo in header comment (ansi2knr rather than md5);
added conditionalization for C++ compilation from Martin
Purschke <purschke@bnl.gov>.
1999-05-03 lpd Original version.
\end{verbatim}
\subsection{Asynchronous socket services}
The \module{asynchat} and \module{asyncore} modules contain the
following notice:
\begin{verbatim}
Copyright 1996 by Sam Rushing
All Rights Reserved
Permission to use, copy, modify, and distribute this software and
its documentation for any purpose and without fee is hereby
granted, provided that the above copyright notice appear in all
copies and that both that copyright notice and this permission
notice appear in supporting documentation, and that the name of Sam
Rushing not be used in advertising or publicity pertaining to
distribution of the software without specific, written prior
permission.
SAM RUSHING DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN
NO EVENT SHALL SAM RUSHING BE LIABLE FOR ANY SPECIAL, INDIRECT OR
CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS
OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
\end{verbatim}
\subsection{Cookie management}
The \module{Cookie} module contains the following notice:
\begin{verbatim}
Copyright 2000 by Timothy O'Malley <timo@alum.mit.edu>
All Rights Reserved
Permission to use, copy, modify, and distribute this software
and its documentation for any purpose and without fee is hereby
granted, provided that the above copyright notice appear in all
copies and that both that copyright notice and this permission
notice appear in supporting documentation, and that the name of
Timothy O'Malley not be used in advertising or publicity
pertaining to distribution of the software without specific, written
prior permission.
Timothy O'Malley DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS
SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY
AND FITNESS, IN NO EVENT SHALL Timothy O'Malley BE LIABLE FOR
ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
PERFORMANCE OF THIS SOFTWARE.
\end{verbatim}
\subsection{Profiling}
The \module{profile} and \module{pstats} modules contain
the following notice:
\begin{verbatim}
Copyright 1994, by InfoSeek Corporation, all rights reserved.
Written by James Roskind
Permission to use, copy, modify, and distribute this Python software
and its associated documentation for any purpose (subject to the
restriction in the following sentence) without fee is hereby granted,
provided that the above copyright notice appears in all copies, and
that both that copyright notice and this permission notice appear in
supporting documentation, and that the name of InfoSeek not be used in
advertising or publicity pertaining to distribution of the software
without specific, written prior permission. This permission is
explicitly restricted to the copying and modification of the software
to remain in Python, compiled Python, or other languages (such as C)
wherein the modified or derived code is exclusively imported into a
Python module.
INFOSEEK CORPORATION DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS
SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS. IN NO EVENT SHALL INFOSEEK CORPORATION BE LIABLE FOR ANY
SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER
RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
\end{verbatim}
\subsection{Execution tracing}
The \module{trace} module contains the following notice:
\begin{verbatim}
portions copyright 2001, Autonomous Zones Industries, Inc., all rights...
err... reserved and offered to the public under the terms of the
Python 2.2 license.
Author: Zooko O'Whielacronx
http://zooko.com/
mailto:zooko@zooko.com
Copyright 2000, Mojam Media, Inc., all rights reserved.
Author: Skip Montanaro
Copyright 1999, Bioreason, Inc., all rights reserved.
Author: Andrew Dalke
Copyright 1995-1997, Automatrix, Inc., all rights reserved.
Author: Skip Montanaro
Copyright 1991-1995, Stichting Mathematisch Centrum, all rights reserved.
Permission to use, copy, modify, and distribute this Python software and
its associated documentation for any purpose without fee is hereby
granted, provided that the above copyright notice appears in all copies,
and that both that copyright notice and this permission notice appear in
supporting documentation, and that the name of neither Automatrix,
Bioreason or Mojam Media be used in advertising or publicity pertaining to
distribution of the software without specific, written prior permission.
\end{verbatim}
\subsection{UUencode and UUdecode functions}
The \module{uu} module contains the following notice:
\begin{verbatim}
Copyright 1994 by Lance Ellinghouse
Cathedral City, California Republic, United States of America.
All Rights Reserved
Permission to use, copy, modify, and distribute this software and its
documentation for any purpose and without fee is hereby granted,
provided that the above copyright notice appear in all copies and that
both that copyright notice and this permission notice appear in
supporting documentation, and that the name of Lance Ellinghouse
not be used in advertising or publicity pertaining to distribution
of the software without specific, written prior permission.
LANCE ELLINGHOUSE DISCLAIMS ALL WARRANTIES WITH REGARD TO
THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS, IN NO EVENT SHALL LANCE ELLINGHOUSE CENTRUM BE LIABLE
FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
Modified by Jack Jansen, CWI, July 1995:
- Use binascii module to do the actual line-by-line conversion
between ascii and binary. This results in a 1000-fold speedup. The C
version is still 5 times faster, though.
- Arguments more compliant with python standard
\end{verbatim}
\subsection{XML Remote Procedure Calls}
The \module{xmlrpclib} module contains the following notice:
\begin{verbatim}
The XML-RPC client interface is
Copyright (c) 1999-2002 by Secret Labs AB
Copyright (c) 1999-2002 by Fredrik Lundh
By obtaining, using, and/or copying this software and/or its
associated documentation, you agree that you have read, understood,
and will comply with the following terms and conditions:
Permission to use, copy, modify, and distribute this software and
its associated documentation for any purpose and without fee is
hereby granted, provided that the above copyright notice appears in
all copies, and that both that copyright notice and this permission
notice appear in supporting documentation, and that the name of
Secret Labs AB or the author not be used in advertising or publicity
pertaining to distribution of the software without specific, written
prior permission.
SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD
TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANT-
ABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR
BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
OF THIS SOFTWARE.
\end{verbatim}

View File

@ -1,61 +0,0 @@
\label{reporting-bugs}
Python is a mature programming language which has established a
reputation for stability. In order to maintain this reputation, the
developers would like to know of any deficiencies you find in Python
or its documentation.
Before submitting a report, you will be required to log into SourceForge;
this will make it possible for the developers to contact you
for additional information if needed. It is not possible to submit a
bug report anonymously.
All bug reports should be submitted via the Python Bug Tracker on
SourceForge (\url{http://sourceforge.net/bugs/?group_id=5470}). The
bug tracker offers a Web form which allows pertinent information to be
entered and submitted to the developers.
The first step in filing a report is to determine whether the problem
has already been reported. The advantage in doing so, aside from
saving the developers time, is that you learn what has been done to
fix it; it may be that the problem has already been fixed for the next
release, or additional information is needed (in which case you are
welcome to provide it if you can!). To do this, search the bug
database using the search box on the left side of the page.
If the problem you're reporting is not already in the bug tracker, go
back to the Python Bug Tracker
(\url{http://sourceforge.net/bugs/?group_id=5470}). Select the
``Submit a Bug'' link at the top of the page to open the bug reporting
form.
The submission form has a number of fields. The only fields that are
required are the ``Summary'' and ``Details'' fields. For the summary,
enter a \emph{very} short description of the problem; less than ten
words is good. In the Details field, describe the problem in detail,
including what you expected to happen and what did happen. Be sure to
include the version of Python you used, whether any extension modules
were involved, and what hardware and software platform you were using
(including version information as appropriate).
The only other field that you may want to set is the ``Category''
field, which allows you to place the bug report into a broad category
(such as ``Documentation'' or ``Library'').
Each bug report will be assigned to a developer who will determine
what needs to be done to correct the problem. You will
receive an update each time action is taken on the bug.
\begin{seealso}
\seetitle[http://www-mice.cs.ucl.ac.uk/multimedia/software/documentation/ReportingBugs.html]{How
to Report Bugs Effectively}{Article which goes into some
detail about how to create a useful bug report. This
describes what kind of information is useful and why it is
useful.}
\seetitle[http://www.mozilla.org/quality/bug-writing-guidelines.html]{Bug
Writing Guidelines}{Information about writing a good bug
report. Some of this is specific to the Mozilla project, but
describes general good practices.}
\end{seealso}

View File

@ -1,76 +0,0 @@
typedef struct _typeobject {
PyObject_VAR_HEAD
char *tp_name; /* For printing, in format "<module>.<name>" */
int tp_basicsize, tp_itemsize; /* For allocation */
/* Methods to implement standard operations */
destructor tp_dealloc;
printfunc tp_print;
getattrfunc tp_getattr;
setattrfunc tp_setattr;
cmpfunc tp_compare;
reprfunc tp_repr;
/* Method suites for standard classes */
PyNumberMethods *tp_as_number;
PySequenceMethods *tp_as_sequence;
PyMappingMethods *tp_as_mapping;
/* More standard operations (here for binary compatibility) */
hashfunc tp_hash;
ternaryfunc tp_call;
reprfunc tp_str;
getattrofunc tp_getattro;
setattrofunc tp_setattro;
/* Functions to access object as input/output buffer */
PyBufferProcs *tp_as_buffer;
/* Flags to define presence of optional/expanded features */
long tp_flags;
char *tp_doc; /* Documentation string */
/* Assigned meaning in release 2.0 */
/* call function for all accessible objects */
traverseproc tp_traverse;
/* delete references to contained objects */
inquiry tp_clear;
/* Assigned meaning in release 2.1 */
/* rich comparisons */
richcmpfunc tp_richcompare;
/* weak reference enabler */
long tp_weaklistoffset;
/* Added in release 2.2 */
/* Iterators */
getiterfunc tp_iter;
iternextfunc tp_iternext;
/* Attribute descriptor and subclassing stuff */
struct PyMethodDef *tp_methods;
struct PyMemberDef *tp_members;
struct PyGetSetDef *tp_getset;
struct _typeobject *tp_base;
PyObject *tp_dict;
descrgetfunc tp_descr_get;
descrsetfunc tp_descr_set;
long tp_dictoffset;
initproc tp_init;
allocfunc tp_alloc;
newfunc tp_new;
freefunc tp_free; /* Low-level free-memory routine */
inquiry tp_is_gc; /* For PyObject_IS_GC */
PyObject *tp_bases;
PyObject *tp_mro; /* method resolution order */
PyObject *tp_cache;
PyObject *tp_subclasses;
PyObject *tp_weaklist;
} PyTypeObject;

3811
Doc/dist/dist.tex vendored

File diff suppressed because it is too large Load Diff

113
Doc/dist/sysconfig.tex vendored
View File

@ -1,113 +0,0 @@
\section{\module{distutils.sysconfig} ---
System configuration information}
\declaremodule{standard}{distutils.sysconfig}
\modulesynopsis{Low-level access to configuration information of the
Python interpreter.}
\moduleauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
\moduleauthor{Greg Ward}{gward@python.net}
\sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
The \module{distutils.sysconfig} module provides access to Python's
low-level configuration information. The specific configuration
variables available depend heavily on the platform and configuration.
The specific variables depend on the build process for the specific
version of Python being run; the variables are those found in the
\file{Makefile} and configuration header that are installed with
Python on \UNIX{} systems. The configuration header is called
\file{pyconfig.h} for Python versions starting with 2.2, and
\file{config.h} for earlier versions of Python.
Some additional functions are provided which perform some useful
manipulations for other parts of the \module{distutils} package.
\begin{datadesc}{PREFIX}
The result of \code{os.path.normpath(sys.prefix)}.
\end{datadesc}
\begin{datadesc}{EXEC_PREFIX}
The result of \code{os.path.normpath(sys.exec_prefix)}.
\end{datadesc}
\begin{funcdesc}{get_config_var}{name}
Return the value of a single variable. This is equivalent to
\code{get_config_vars().get(\var{name})}.
\end{funcdesc}
\begin{funcdesc}{get_config_vars}{\moreargs}
Return a set of variable definitions. If there are no arguments,
this returns a dictionary mapping names of configuration variables
to values. If arguments are provided, they should be strings, and
the return value will be a sequence giving the associated values.
If a given name does not have a corresponding value, \code{None}
will be included for that variable.
\end{funcdesc}
\begin{funcdesc}{get_config_h_filename}{}
Return the full path name of the configuration header. For \UNIX,
this will be the header generated by the \program{configure} script;
for other platforms the header will have been supplied directly by
the Python source distribution. The file is a platform-specific
text file.
\end{funcdesc}
\begin{funcdesc}{get_makefile_filename}{}
Return the full path name of the \file{Makefile} used to build
Python. For \UNIX, this will be a file generated by the
\program{configure} script; the meaning for other platforms will
vary. The file is a platform-specific text file, if it exists.
This function is only useful on \POSIX{} platforms.
\end{funcdesc}
\begin{funcdesc}{get_python_inc}{\optional{plat_specific\optional{, prefix}}}
Return the directory for either the general or platform-dependent C
include files. If \var{plat_specific} is true, the
platform-dependent include directory is returned; if false or
omitted, the platform-independent directory is returned. If
\var{prefix} is given, it is used as either the prefix instead of
\constant{PREFIX}, or as the exec-prefix instead of
\constant{EXEC_PREFIX} if \var{plat_specific} is true.
\end{funcdesc}
\begin{funcdesc}{get_python_lib}{\optional{plat_specific\optional{,
standard_lib\optional{, prefix}}}}
Return the directory for either the general or platform-dependent
library installation. If \var{plat_specific} is true, the
platform-dependent include directory is returned; if false or
omitted, the platform-independent directory is returned. If
\var{prefix} is given, it is used as either the prefix instead of
\constant{PREFIX}, or as the exec-prefix instead of
\constant{EXEC_PREFIX} if \var{plat_specific} is true. If
\var{standard_lib} is true, the directory for the standard library
is returned rather than the directory for the installation of
third-party extensions.
\end{funcdesc}
The following function is only intended for use within the
\module{distutils} package.
\begin{funcdesc}{customize_compiler}{compiler}
Do any platform-specific customization of a
\class{distutils.ccompiler.CCompiler} instance.
This function is only needed on \UNIX{} at this time, but should be
called consistently to support forward-compatibility. It inserts
the information that varies across \UNIX{} flavors and is stored in
Python's \file{Makefile}. This information includes the selected
compiler, compiler and linker options, and the extension used by the
linker for shared objects.
\end{funcdesc}
This function is even more special-purpose, and should only be used
from Python's own build procedures.
\begin{funcdesc}{set_python_build}{}
Inform the \module{distutils.sysconfig} module that it is being used
as part of the build process for Python. This changes a lot of
relative locations for files, allowing them to be located in the
build area rather than in an installed Python.
\end{funcdesc}

File diff suppressed because it is too large Load Diff

View File

@ -1,143 +0,0 @@
\chapter{Building C and \Cpp{} Extensions with distutils
\label{building}}
\sectionauthor{Martin v. L\"owis}{martin@v.loewis.de}
Starting in Python 1.4, Python provides, on \UNIX{}, a special make
file for building make files for building dynamically-linked
extensions and custom interpreters. Starting with Python 2.0, this
mechanism (known as related to Makefile.pre.in, and Setup files) is no
longer supported. Building custom interpreters was rarely used, and
extension modules can be built using distutils.
Building an extension module using distutils requires that distutils
is installed on the build machine, which is included in Python 2.x and
available separately for Python 1.5. Since distutils also supports
creation of binary packages, users don't necessarily need a compiler
and distutils to install the extension.
A distutils package contains a driver script, \file{setup.py}. This is
a plain Python file, which, in the most simple case, could look like
this:
\begin{verbatim}
from distutils.core import setup, Extension
module1 = Extension('demo',
sources = ['demo.c'])
setup (name = 'PackageName',
version = '1.0',
description = 'This is a demo package',
ext_modules = [module1])
\end{verbatim}
With this \file{setup.py}, and a file \file{demo.c}, running
\begin{verbatim}
python setup.py build
\end{verbatim}
will compile \file{demo.c}, and produce an extension module named
\samp{demo} in the \file{build} directory. Depending on the system,
the module file will end up in a subdirectory \file{build/lib.system},
and may have a name like \file{demo.so} or \file{demo.pyd}.
In the \file{setup.py}, all execution is performed by calling the
\samp{setup} function. This takes a variable number of keyword
arguments, of which the example above uses only a
subset. Specifically, the example specifies meta-information to build
packages, and it specifies the contents of the package. Normally, a
package will contain of addition modules, like Python source modules,
documentation, subpackages, etc. Please refer to the distutils
documentation in \citetitle[../dist/dist.html]{Distributing Python
Modules} to learn more about the features of distutils; this section
explains building extension modules only.
It is common to pre-compute arguments to \function{setup}, to better
structure the driver script. In the example above,
the\samp{ext_modules} argument to \function{setup} is a list of
extension modules, each of which is an instance of the
\class{Extension}. In the example, the instance defines an extension
named \samp{demo} which is build by compiling a single source file,
\file{demo.c}.
In many cases, building an extension is more complex, since additional
preprocessor defines and libraries may be needed. This is demonstrated
in the example below.
\begin{verbatim}
from distutils.core import setup, Extension
module1 = Extension('demo',
define_macros = [('MAJOR_VERSION', '1'),
('MINOR_VERSION', '0')],
include_dirs = ['/usr/local/include'],
libraries = ['tcl83'],
library_dirs = ['/usr/local/lib'],
sources = ['demo.c'])
setup (name = 'PackageName',
version = '1.0',
description = 'This is a demo package',
author = 'Martin v. Loewis',
author_email = 'martin@v.loewis.de',
url = 'http://www.python.org/doc/current/ext/building.html',
long_description = '''
This is really just a demo package.
''',
ext_modules = [module1])
\end{verbatim}
In this example, \function{setup} is called with additional
meta-information, which is recommended when distribution packages have
to be built. For the extension itself, it specifies preprocessor
defines, include directories, library directories, and libraries.
Depending on the compiler, distutils passes this information in
different ways to the compiler. For example, on \UNIX{}, this may
result in the compilation commands
\begin{verbatim}
gcc -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC -DMAJOR_VERSION=1 -DMINOR_VERSION=0 -I/usr/local/include -I/usr/local/include/python2.2 -c demo.c -o build/temp.linux-i686-2.2/demo.o
gcc -shared build/temp.linux-i686-2.2/demo.o -L/usr/local/lib -ltcl83 -o build/lib.linux-i686-2.2/demo.so
\end{verbatim}
These lines are for demonstration purposes only; distutils users
should trust that distutils gets the invocations right.
\section{Distributing your extension modules
\label{distributing}}
When an extension has been successfully build, there are three ways to
use it.
End-users will typically want to install the module, they do so by
running
\begin{verbatim}
python setup.py install
\end{verbatim}
Module maintainers should produce source packages; to do so, they run
\begin{verbatim}
python setup.py sdist
\end{verbatim}
In some cases, additional files need to be included in a source
distribution; this is done through a \file{MANIFEST.in} file; see the
distutils documentation for details.
If the source distribution has been build successfully, maintainers
can also create binary distributions. Depending on the platform, one
of the following commands can be used to do so.
\begin{verbatim}
python setup.py bdist_wininst
python setup.py bdist_rpm
python setup.py bdist_dumb
\end{verbatim}

View File

@ -1,316 +0,0 @@
\chapter{Embedding Python in Another Application
\label{embedding}}
The previous chapters discussed how to extend Python, that is, how to
extend the functionality of Python by attaching a library of C
functions to it. It is also possible to do it the other way around:
enrich your C/\Cpp{} application by embedding Python in it. Embedding
provides your application with the ability to implement some of the
functionality of your application in Python rather than C or \Cpp.
This can be used for many purposes; one example would be to allow
users to tailor the application to their needs by writing some scripts
in Python. You can also use it yourself if some of the functionality
can be written in Python more easily.
Embedding Python is similar to extending it, but not quite. The
difference is that when you extend Python, the main program of the
application is still the Python interpreter, while if you embed
Python, the main program may have nothing to do with Python ---
instead, some parts of the application occasionally call the Python
interpreter to run some Python code.
So if you are embedding Python, you are providing your own main
program. One of the things this main program has to do is initialize
the Python interpreter. At the very least, you have to call the
function \cfunction{Py_Initialize()} (on Mac OS, call
\cfunction{PyMac_Initialize()} instead). There are optional calls to
pass command line arguments to Python. Then later you can call the
interpreter from any part of the application.
There are several different ways to call the interpreter: you can pass
a string containing Python statements to
\cfunction{PyRun_SimpleString()}, or you can pass a stdio file pointer
and a file name (for identification in error messages only) to
\cfunction{PyRun_SimpleFile()}. You can also call the lower-level
operations described in the previous chapters to construct and use
Python objects.
A simple demo of embedding Python can be found in the directory
\file{Demo/embed/} of the source distribution.
\begin{seealso}
\seetitle[../api/api.html]{Python/C API Reference Manual}{The
details of Python's C interface are given in this manual.
A great deal of necessary information can be found here.}
\end{seealso}
\section{Very High Level Embedding
\label{high-level-embedding}}
The simplest form of embedding Python is the use of the very
high level interface. This interface is intended to execute a
Python script without needing to interact with the application
directly. This can for example be used to perform some operation
on a file.
\begin{verbatim}
#include <Python.h>
int
main(int argc, char *argv[])
{
Py_Initialize();
PyRun_SimpleString("from time import time,ctime\n"
"print 'Today is',ctime(time())\n");
Py_Finalize();
return 0;
}
\end{verbatim}
The above code first initializes the Python interpreter with
\cfunction{Py_Initialize()}, followed by the execution of a hard-coded
Python script that print the date and time. Afterwards, the
\cfunction{Py_Finalize()} call shuts the interpreter down, followed by
the end of the program. In a real program, you may want to get the
Python script from another source, perhaps a text-editor routine, a
file, or a database. Getting the Python code from a file can better
be done by using the \cfunction{PyRun_SimpleFile()} function, which
saves you the trouble of allocating memory space and loading the file
contents.
\section{Beyond Very High Level Embedding: An overview
\label{lower-level-embedding}}
The high level interface gives you the ability to execute
arbitrary pieces of Python code from your application, but
exchanging data values is quite cumbersome to say the least. If
you want that, you should use lower level calls. At the cost of
having to write more C code, you can achieve almost anything.
It should be noted that extending Python and embedding Python
is quite the same activity, despite the different intent. Most
topics discussed in the previous chapters are still valid. To
show this, consider what the extension code from Python to C
really does:
\begin{enumerate}
\item Convert data values from Python to C,
\item Perform a function call to a C routine using the
converted values, and
\item Convert the data values from the call from C to Python.
\end{enumerate}
When embedding Python, the interface code does:
\begin{enumerate}
\item Convert data values from C to Python,
\item Perform a function call to a Python interface routine
using the converted values, and
\item Convert the data values from the call from Python to C.
\end{enumerate}
As you can see, the data conversion steps are simply swapped to
accommodate the different direction of the cross-language transfer.
The only difference is the routine that you call between both
data conversions. When extending, you call a C routine, when
embedding, you call a Python routine.
This chapter will not discuss how to convert data from Python
to C and vice versa. Also, proper use of references and dealing
with errors is assumed to be understood. Since these aspects do not
differ from extending the interpreter, you can refer to earlier
chapters for the required information.
\section{Pure Embedding
\label{pure-embedding}}
The first program aims to execute a function in a Python
script. Like in the section about the very high level interface,
the Python interpreter does not directly interact with the
application (but that will change in the next section).
The code to run a function defined in a Python script is:
\verbatiminput{run-func.c}
This code loads a Python script using \code{argv[1]}, and calls the
function named in \code{argv[2]}. Its integer arguments are the other
values of the \code{argv} array. If you compile and link this
program (let's call the finished executable \program{call}), and use
it to execute a Python script, such as:
\begin{verbatim}
def multiply(a,b):
print "Will compute", a, "times", b
c = 0
for i in range(0, a):
c = c + b
return c
\end{verbatim}
then the result should be:
\begin{verbatim}
$ call multiply multiply 3 2
Will compute 3 times 2
Result of call: 6
\end{verbatim} % $
Although the program is quite large for its functionality, most of the
code is for data conversion between Python and C, and for error
reporting. The interesting part with respect to embedding Python
starts with
\begin{verbatim}
Py_Initialize();
pName = PyString_FromString(argv[1]);
/* Error checking of pName left out */
pModule = PyImport_Import(pName);
\end{verbatim}
After initializing the interpreter, the script is loaded using
\cfunction{PyImport_Import()}. This routine needs a Python string
as its argument, which is constructed using the
\cfunction{PyString_FromString()} data conversion routine.
\begin{verbatim}
pFunc = PyObject_GetAttrString(pModule, argv[2]);
/* pFunc is a new reference */
if (pFunc && PyCallable_Check(pFunc)) {
...
}
Py_XDECREF(pFunc);
\end{verbatim}
Once the script is loaded, the name we're looking for is retrieved
using \cfunction{PyObject_GetAttrString()}. If the name exists, and
the object returned is callable, you can safely assume that it is a
function. The program then proceeds by constructing a tuple of
arguments as normal. The call to the Python function is then made
with:
\begin{verbatim}
pValue = PyObject_CallObject(pFunc, pArgs);
\end{verbatim}
Upon return of the function, \code{pValue} is either \NULL{} or it
contains a reference to the return value of the function. Be sure to
release the reference after examining the value.
\section{Extending Embedded Python
\label{extending-with-embedding}}
Until now, the embedded Python interpreter had no access to
functionality from the application itself. The Python API allows this
by extending the embedded interpreter. That is, the embedded
interpreter gets extended with routines provided by the application.
While it sounds complex, it is not so bad. Simply forget for a while
that the application starts the Python interpreter. Instead, consider
the application to be a set of subroutines, and write some glue code
that gives Python access to those routines, just like you would write
a normal Python extension. For example:
\begin{verbatim}
static int numargs=0;
/* Return the number of arguments of the application command line */
static PyObject*
emb_numargs(PyObject *self, PyObject *args)
{
if(!PyArg_ParseTuple(args, ":numargs"))
return NULL;
return Py_BuildValue("i", numargs);
}
static PyMethodDef EmbMethods[] = {
{"numargs", emb_numargs, METH_VARARGS,
"Return the number of arguments received by the process."},
{NULL, NULL, 0, NULL}
};
\end{verbatim}
Insert the above code just above the \cfunction{main()} function.
Also, insert the following two statements directly after
\cfunction{Py_Initialize()}:
\begin{verbatim}
numargs = argc;
Py_InitModule("emb", EmbMethods);
\end{verbatim}
These two lines initialize the \code{numargs} variable, and make the
\function{emb.numargs()} function accessible to the embedded Python
interpreter. With these extensions, the Python script can do things
like
\begin{verbatim}
import emb
print "Number of arguments", emb.numargs()
\end{verbatim}
In a real application, the methods will expose an API of the
application to Python.
%\section{For the future}
%
%You don't happen to have a nice library to get textual
%equivalents of numeric values do you :-) ?
%Callbacks here ? (I may be using information from that section
%?!)
%threads
%code examples do not really behave well if errors happen
% (what to watch out for)
\section{Embedding Python in \Cpp
\label{embeddingInCplusplus}}
It is also possible to embed Python in a \Cpp{} program; precisely how this
is done will depend on the details of the \Cpp{} system used; in general you
will need to write the main program in \Cpp, and use the \Cpp{} compiler
to compile and link your program. There is no need to recompile Python
itself using \Cpp.
\section{Linking Requirements
\label{link-reqs}}
While the \program{configure} script shipped with the Python sources
will correctly build Python to export the symbols needed by
dynamically linked extensions, this is not automatically inherited by
applications which embed the Python library statically, at least on
\UNIX. This is an issue when the application is linked to the static
runtime library (\file{libpython.a}) and needs to load dynamic
extensions (implemented as \file{.so} files).
The problem is that some entry points are defined by the Python
runtime solely for extension modules to use. If the embedding
application does not use any of these entry points, some linkers will
not include those entries in the symbol table of the finished
executable. Some additional options are needed to inform the linker
not to remove these symbols.
Determining the right options to use for any given platform can be
quite difficult, but fortunately the Python configuration already has
those values. To retrieve them from an installed Python interpreter,
start an interactive interpreter and have a short session like this:
\begin{verbatim}
>>> import distutils.sysconfig
>>> distutils.sysconfig.get_config_var('LINKFORSHARED')
'-Xlinker -export-dynamic'
\end{verbatim}
\refstmodindex{distutils.sysconfig}
The contents of the string presented will be the options that should
be used. If the string is empty, there's no need to add any
additional options. The \constant{LINKFORSHARED} definition
corresponds to the variable of the same name in Python's top-level
\file{Makefile}.

View File

@ -1,67 +0,0 @@
\documentclass{manual}
% XXX PM explain how to add new types to Python
\title{Extending and Embedding the Python Interpreter}
\input{boilerplate}
% Tell \index to actually write the .idx file
\makeindex
\begin{document}
\maketitle
\ifhtml
\chapter*{Front Matter\label{front}}
\fi
\input{copyright}
\begin{abstract}
\noindent
Python is an interpreted, object-oriented programming language. This
document describes how to write modules in C or \Cpp{} to extend the
Python interpreter with new modules. Those modules can define new
functions but also new object types and their methods. The document
also describes how to embed the Python interpreter in another
application, for use as an extension language. Finally, it shows how
to compile and link extension modules so that they can be loaded
dynamically (at run time) into the interpreter, if the underlying
operating system supports this feature.
This document assumes basic knowledge about Python. For an informal
introduction to the language, see the
\citetitle[../tut/tut.html]{Python Tutorial}. The
\citetitle[../ref/ref.html]{Python Reference Manual} gives a more
formal definition of the language. The
\citetitle[../lib/lib.html]{Python Library Reference} documents the
existing object types, functions and modules (both built-in and
written in Python) that give the language its wide application range.
For a detailed description of the whole Python/C API, see the separate
\citetitle[../api/api.html]{Python/C API Reference Manual}.
\end{abstract}
\tableofcontents
\input{extending}
\input{newtypes}
\input{building}
\input{windows}
\input{embedding}
\appendix
\chapter{Reporting Bugs}
\input{reportingbugs}
\chapter{History and License}
\input{license}
\end{document}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,54 +0,0 @@
#include <Python.h>
typedef struct {
PyObject_HEAD
/* Type-specific fields go here. */
} noddy_NoddyObject;
static PyTypeObject noddy_NoddyType = {
PyObject_HEAD_INIT(NULL)
0, /*ob_size*/
"noddy.Noddy", /*tp_name*/
sizeof(noddy_NoddyObject), /*tp_basicsize*/
0, /*tp_itemsize*/
0, /*tp_dealloc*/
0, /*tp_print*/
0, /*tp_getattr*/
0, /*tp_setattr*/
0, /*tp_compare*/
0, /*tp_repr*/
0, /*tp_as_number*/
0, /*tp_as_sequence*/
0, /*tp_as_mapping*/
0, /*tp_hash */
0, /*tp_call*/
0, /*tp_str*/
0, /*tp_getattro*/
0, /*tp_setattro*/
0, /*tp_as_buffer*/
Py_TPFLAGS_DEFAULT, /*tp_flags*/
"Noddy objects", /* tp_doc */
};
static PyMethodDef noddy_methods[] = {
{NULL} /* Sentinel */
};
#ifndef PyMODINIT_FUNC /* declarations for DLL import/export */
#define PyMODINIT_FUNC void
#endif
PyMODINIT_FUNC
initnoddy(void)
{
PyObject* m;
noddy_NoddyType.tp_new = PyType_GenericNew;
if (PyType_Ready(&noddy_NoddyType) < 0)
return;
m = Py_InitModule3("noddy", noddy_methods,
"Example module that creates an extension type.");
Py_INCREF(&noddy_NoddyType);
PyModule_AddObject(m, "Noddy", (PyObject *)&noddy_NoddyType);
}

View File

@ -1,190 +0,0 @@
#include <Python.h>
#include "structmember.h"
typedef struct {
PyObject_HEAD
PyObject *first; /* first name */
PyObject *last; /* last name */
int number;
} Noddy;
static void
Noddy_dealloc(Noddy* self)
{
Py_XDECREF(self->first);
Py_XDECREF(self->last);
self->ob_type->tp_free((PyObject*)self);
}
static PyObject *
Noddy_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
Noddy *self;
self = (Noddy *)type->tp_alloc(type, 0);
if (self != NULL) {
self->first = PyString_FromString("");
if (self->first == NULL)
{
Py_DECREF(self);
return NULL;
}
self->last = PyString_FromString("");
if (self->last == NULL)
{
Py_DECREF(self);
return NULL;
}
self->number = 0;
}
return (PyObject *)self;
}
static int
Noddy_init(Noddy *self, PyObject *args, PyObject *kwds)
{
PyObject *first=NULL, *last=NULL, *tmp;
static char *kwlist[] = {"first", "last", "number", NULL};
if (! PyArg_ParseTupleAndKeywords(args, kwds, "|OOi", kwlist,
&first, &last,
&self->number))
return -1;
if (first) {
tmp = self->first;
Py_INCREF(first);
self->first = first;
Py_XDECREF(tmp);
}
if (last) {
tmp = self->last;
Py_INCREF(last);
self->last = last;
Py_XDECREF(tmp);
}
return 0;
}
static PyMemberDef Noddy_members[] = {
{"first", T_OBJECT_EX, offsetof(Noddy, first), 0,
"first name"},
{"last", T_OBJECT_EX, offsetof(Noddy, last), 0,
"last name"},
{"number", T_INT, offsetof(Noddy, number), 0,
"noddy number"},
{NULL} /* Sentinel */
};
static PyObject *
Noddy_name(Noddy* self)
{
static PyObject *format = NULL;
PyObject *args, *result;
if (format == NULL) {
format = PyString_FromString("%s %s");
if (format == NULL)
return NULL;
}
if (self->first == NULL) {
PyErr_SetString(PyExc_AttributeError, "first");
return NULL;
}
if (self->last == NULL) {
PyErr_SetString(PyExc_AttributeError, "last");
return NULL;
}
args = Py_BuildValue("OO", self->first, self->last);
if (args == NULL)
return NULL;
result = PyString_Format(format, args);
Py_DECREF(args);
return result;
}
static PyMethodDef Noddy_methods[] = {
{"name", (PyCFunction)Noddy_name, METH_NOARGS,
"Return the name, combining the first and last name"
},
{NULL} /* Sentinel */
};
static PyTypeObject NoddyType = {
PyObject_HEAD_INIT(NULL)
0, /*ob_size*/
"noddy.Noddy", /*tp_name*/
sizeof(Noddy), /*tp_basicsize*/
0, /*tp_itemsize*/
(destructor)Noddy_dealloc, /*tp_dealloc*/
0, /*tp_print*/
0, /*tp_getattr*/
0, /*tp_setattr*/
0, /*tp_compare*/
0, /*tp_repr*/
0, /*tp_as_number*/
0, /*tp_as_sequence*/
0, /*tp_as_mapping*/
0, /*tp_hash */
0, /*tp_call*/
0, /*tp_str*/
0, /*tp_getattro*/
0, /*tp_setattro*/
0, /*tp_as_buffer*/
Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /*tp_flags*/
"Noddy objects", /* tp_doc */
0, /* tp_traverse */
0, /* tp_clear */
0, /* tp_richcompare */
0, /* tp_weaklistoffset */
0, /* tp_iter */
0, /* tp_iternext */
Noddy_methods, /* tp_methods */
Noddy_members, /* tp_members */
0, /* tp_getset */
0, /* tp_base */
0, /* tp_dict */
0, /* tp_descr_get */
0, /* tp_descr_set */
0, /* tp_dictoffset */
(initproc)Noddy_init, /* tp_init */
0, /* tp_alloc */
Noddy_new, /* tp_new */
};
static PyMethodDef module_methods[] = {
{NULL} /* Sentinel */
};
#ifndef PyMODINIT_FUNC /* declarations for DLL import/export */
#define PyMODINIT_FUNC void
#endif
PyMODINIT_FUNC
initnoddy2(void)
{
PyObject* m;
if (PyType_Ready(&NoddyType) < 0)
return;
m = Py_InitModule3("noddy2", module_methods,
"Example module that creates an extension type.");
if (m == NULL)
return;
Py_INCREF(&NoddyType);
PyModule_AddObject(m, "Noddy", (PyObject *)&NoddyType);
}

View File

@ -1,243 +0,0 @@
#include <Python.h>
#include "structmember.h"
typedef struct {
PyObject_HEAD
PyObject *first;
PyObject *last;
int number;
} Noddy;
static void
Noddy_dealloc(Noddy* self)
{
Py_XDECREF(self->first);
Py_XDECREF(self->last);
self->ob_type->tp_free((PyObject*)self);
}
static PyObject *
Noddy_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
Noddy *self;
self = (Noddy *)type->tp_alloc(type, 0);
if (self != NULL) {
self->first = PyString_FromString("");
if (self->first == NULL)
{
Py_DECREF(self);
return NULL;
}
self->last = PyString_FromString("");
if (self->last == NULL)
{
Py_DECREF(self);
return NULL;
}
self->number = 0;
}
return (PyObject *)self;
}
static int
Noddy_init(Noddy *self, PyObject *args, PyObject *kwds)
{
PyObject *first=NULL, *last=NULL, *tmp;
static char *kwlist[] = {"first", "last", "number", NULL};
if (! PyArg_ParseTupleAndKeywords(args, kwds, "|SSi", kwlist,
&first, &last,
&self->number))
return -1;
if (first) {
tmp = self->first;
Py_INCREF(first);
self->first = first;
Py_DECREF(tmp);
}
if (last) {
tmp = self->last;
Py_INCREF(last);
self->last = last;
Py_DECREF(tmp);
}
return 0;
}
static PyMemberDef Noddy_members[] = {
{"number", T_INT, offsetof(Noddy, number), 0,
"noddy number"},
{NULL} /* Sentinel */
};
static PyObject *
Noddy_getfirst(Noddy *self, void *closure)
{
Py_INCREF(self->first);
return self->first;
}
static int
Noddy_setfirst(Noddy *self, PyObject *value, void *closure)
{
if (value == NULL) {
PyErr_SetString(PyExc_TypeError, "Cannot delete the first attribute");
return -1;
}
if (! PyString_Check(value)) {
PyErr_SetString(PyExc_TypeError,
"The first attribute value must be a string");
return -1;
}
Py_DECREF(self->first);
Py_INCREF(value);
self->first = value;
return 0;
}
static PyObject *
Noddy_getlast(Noddy *self, void *closure)
{
Py_INCREF(self->last);
return self->last;
}
static int
Noddy_setlast(Noddy *self, PyObject *value, void *closure)
{
if (value == NULL) {
PyErr_SetString(PyExc_TypeError, "Cannot delete the last attribute");
return -1;
}
if (! PyString_Check(value)) {
PyErr_SetString(PyExc_TypeError,
"The last attribute value must be a string");
return -1;
}
Py_DECREF(self->last);
Py_INCREF(value);
self->last = value;
return 0;
}
static PyGetSetDef Noddy_getseters[] = {
{"first",
(getter)Noddy_getfirst, (setter)Noddy_setfirst,
"first name",
NULL},
{"last",
(getter)Noddy_getlast, (setter)Noddy_setlast,
"last name",
NULL},
{NULL} /* Sentinel */
};
static PyObject *
Noddy_name(Noddy* self)
{
static PyObject *format = NULL;
PyObject *args, *result;
if (format == NULL) {
format = PyString_FromString("%s %s");
if (format == NULL)
return NULL;
}
args = Py_BuildValue("OO", self->first, self->last);
if (args == NULL)
return NULL;
result = PyString_Format(format, args);
Py_DECREF(args);
return result;
}
static PyMethodDef Noddy_methods[] = {
{"name", (PyCFunction)Noddy_name, METH_NOARGS,
"Return the name, combining the first and last name"
},
{NULL} /* Sentinel */
};
static PyTypeObject NoddyType = {
PyObject_HEAD_INIT(NULL)
0, /*ob_size*/
"noddy.Noddy", /*tp_name*/
sizeof(Noddy), /*tp_basicsize*/
0, /*tp_itemsize*/
(destructor)Noddy_dealloc, /*tp_dealloc*/
0, /*tp_print*/
0, /*tp_getattr*/
0, /*tp_setattr*/
0, /*tp_compare*/
0, /*tp_repr*/
0, /*tp_as_number*/
0, /*tp_as_sequence*/
0, /*tp_as_mapping*/
0, /*tp_hash */
0, /*tp_call*/
0, /*tp_str*/
0, /*tp_getattro*/
0, /*tp_setattro*/
0, /*tp_as_buffer*/
Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /*tp_flags*/
"Noddy objects", /* tp_doc */
0, /* tp_traverse */
0, /* tp_clear */
0, /* tp_richcompare */
0, /* tp_weaklistoffset */
0, /* tp_iter */
0, /* tp_iternext */
Noddy_methods, /* tp_methods */
Noddy_members, /* tp_members */
Noddy_getseters, /* tp_getset */
0, /* tp_base */
0, /* tp_dict */
0, /* tp_descr_get */
0, /* tp_descr_set */
0, /* tp_dictoffset */
(initproc)Noddy_init, /* tp_init */
0, /* tp_alloc */
Noddy_new, /* tp_new */
};
static PyMethodDef module_methods[] = {
{NULL} /* Sentinel */
};
#ifndef PyMODINIT_FUNC /* declarations for DLL import/export */
#define PyMODINIT_FUNC void
#endif
PyMODINIT_FUNC
initnoddy3(void)
{
PyObject* m;
if (PyType_Ready(&NoddyType) < 0)
return;
m = Py_InitModule3("noddy3", module_methods,
"Example module that creates an extension type.");
if (m == NULL)
return;
Py_INCREF(&NoddyType);
PyModule_AddObject(m, "Noddy", (PyObject *)&NoddyType);
}

View File

@ -1,224 +0,0 @@
#include <Python.h>
#include "structmember.h"
typedef struct {
PyObject_HEAD
PyObject *first;
PyObject *last;
int number;
} Noddy;
static int
Noddy_traverse(Noddy *self, visitproc visit, void *arg)
{
int vret;
if (self->first) {
vret = visit(self->first, arg);
if (vret != 0)
return vret;
}
if (self->last) {
vret = visit(self->last, arg);
if (vret != 0)
return vret;
}
return 0;
}
static int
Noddy_clear(Noddy *self)
{
PyObject *tmp;
tmp = self->first;
self->first = NULL;
Py_XDECREF(tmp);
tmp = self->last;
self->last = NULL;
Py_XDECREF(tmp);
return 0;
}
static void
Noddy_dealloc(Noddy* self)
{
Noddy_clear(self);
self->ob_type->tp_free((PyObject*)self);
}
static PyObject *
Noddy_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
Noddy *self;
self = (Noddy *)type->tp_alloc(type, 0);
if (self != NULL) {
self->first = PyString_FromString("");
if (self->first == NULL)
{
Py_DECREF(self);
return NULL;
}
self->last = PyString_FromString("");
if (self->last == NULL)
{
Py_DECREF(self);
return NULL;
}
self->number = 0;
}
return (PyObject *)self;
}
static int
Noddy_init(Noddy *self, PyObject *args, PyObject *kwds)
{
PyObject *first=NULL, *last=NULL, *tmp;
static char *kwlist[] = {"first", "last", "number", NULL};
if (! PyArg_ParseTupleAndKeywords(args, kwds, "|OOi", kwlist,
&first, &last,
&self->number))
return -1;
if (first) {
tmp = self->first;
Py_INCREF(first);
self->first = first;
Py_XDECREF(tmp);
}
if (last) {
tmp = self->last;
Py_INCREF(last);
self->last = last;
Py_XDECREF(tmp);
}
return 0;
}
static PyMemberDef Noddy_members[] = {
{"first", T_OBJECT_EX, offsetof(Noddy, first), 0,
"first name"},
{"last", T_OBJECT_EX, offsetof(Noddy, last), 0,
"last name"},
{"number", T_INT, offsetof(Noddy, number), 0,
"noddy number"},
{NULL} /* Sentinel */
};
static PyObject *
Noddy_name(Noddy* self)
{
static PyObject *format = NULL;
PyObject *args, *result;
if (format == NULL) {
format = PyString_FromString("%s %s");
if (format == NULL)
return NULL;
}
if (self->first == NULL) {
PyErr_SetString(PyExc_AttributeError, "first");
return NULL;
}
if (self->last == NULL) {
PyErr_SetString(PyExc_AttributeError, "last");
return NULL;
}
args = Py_BuildValue("OO", self->first, self->last);
if (args == NULL)
return NULL;
result = PyString_Format(format, args);
Py_DECREF(args);
return result;
}
static PyMethodDef Noddy_methods[] = {
{"name", (PyCFunction)Noddy_name, METH_NOARGS,
"Return the name, combining the first and last name"
},
{NULL} /* Sentinel */
};
static PyTypeObject NoddyType = {
PyObject_HEAD_INIT(NULL)
0, /*ob_size*/
"noddy.Noddy", /*tp_name*/
sizeof(Noddy), /*tp_basicsize*/
0, /*tp_itemsize*/
(destructor)Noddy_dealloc, /*tp_dealloc*/
0, /*tp_print*/
0, /*tp_getattr*/
0, /*tp_setattr*/
0, /*tp_compare*/
0, /*tp_repr*/
0, /*tp_as_number*/
0, /*tp_as_sequence*/
0, /*tp_as_mapping*/
0, /*tp_hash */
0, /*tp_call*/
0, /*tp_str*/
0, /*tp_getattro*/
0, /*tp_setattro*/
0, /*tp_as_buffer*/
Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_HAVE_GC, /*tp_flags*/
"Noddy objects", /* tp_doc */
(traverseproc)Noddy_traverse, /* tp_traverse */
(inquiry)Noddy_clear, /* tp_clear */
0, /* tp_richcompare */
0, /* tp_weaklistoffset */
0, /* tp_iter */
0, /* tp_iternext */
Noddy_methods, /* tp_methods */
Noddy_members, /* tp_members */
0, /* tp_getset */
0, /* tp_base */
0, /* tp_dict */
0, /* tp_descr_get */
0, /* tp_descr_set */
0, /* tp_dictoffset */
(initproc)Noddy_init, /* tp_init */
0, /* tp_alloc */
Noddy_new, /* tp_new */
};
static PyMethodDef module_methods[] = {
{NULL} /* Sentinel */
};
#ifndef PyMODINIT_FUNC /* declarations for DLL import/export */
#define PyMODINIT_FUNC void
#endif
PyMODINIT_FUNC
initnoddy4(void)
{
PyObject* m;
if (PyType_Ready(&NoddyType) < 0)
return;
m = Py_InitModule3("noddy4", module_methods,
"Example module that creates an extension type.");
if (m == NULL)
return;
Py_INCREF(&NoddyType);
PyModule_AddObject(m, "Noddy", (PyObject *)&NoddyType);
}

View File

@ -1,68 +0,0 @@
#include <Python.h>
int
main(int argc, char *argv[])
{
PyObject *pName, *pModule, *pDict, *pFunc;
PyObject *pArgs, *pValue;
int i;
if (argc < 3) {
fprintf(stderr,"Usage: call pythonfile funcname [args]\n");
return 1;
}
Py_Initialize();
pName = PyString_FromString(argv[1]);
/* Error checking of pName left out */
pModule = PyImport_Import(pName);
Py_DECREF(pName);
if (pModule != NULL) {
pFunc = PyObject_GetAttrString(pModule, argv[2]);
/* pFunc is a new reference */
if (pFunc && PyCallable_Check(pFunc)) {
pArgs = PyTuple_New(argc - 3);
for (i = 0; i < argc - 3; ++i) {
pValue = PyInt_FromLong(atoi(argv[i + 3]));
if (!pValue) {
Py_DECREF(pArgs);
Py_DECREF(pModule);
fprintf(stderr, "Cannot convert argument\n");
return 1;
}
/* pValue reference stolen here: */
PyTuple_SetItem(pArgs, i, pValue);
}
pValue = PyObject_CallObject(pFunc, pArgs);
Py_DECREF(pArgs);
if (pValue != NULL) {
printf("Result of call: %ld\n", PyInt_AsLong(pValue));
Py_DECREF(pValue);
}
else {
Py_DECREF(pFunc);
Py_DECREF(pModule);
PyErr_Print();
fprintf(stderr,"Call failed\n");
return 1;
}
}
else {
if (PyErr_Occurred())
PyErr_Print();
fprintf(stderr, "Cannot find function \"%s\"\n", argv[2]);
}
Py_XDECREF(pFunc);
Py_DECREF(pModule);
}
else {
PyErr_Print();
fprintf(stderr, "Failed to load \"%s\"\n", argv[1]);
return 1;
}
Py_Finalize();
return 0;
}

View File

@ -1,8 +0,0 @@
from distutils.core import setup, Extension
setup(name="noddy", version="1.0",
ext_modules=[
Extension("noddy", ["noddy.c"]),
Extension("noddy2", ["noddy2.c"]),
Extension("noddy3", ["noddy3.c"]),
Extension("noddy4", ["noddy4.c"]),
])

View File

@ -1,91 +0,0 @@
#include <Python.h>
typedef struct {
PyListObject list;
int state;
} Shoddy;
static PyObject *
Shoddy_increment(Shoddy *self, PyObject *unused)
{
self->state++;
return PyInt_FromLong(self->state);
}
static PyMethodDef Shoddy_methods[] = {
{"increment", (PyCFunction)Shoddy_increment, METH_NOARGS,
PyDoc_STR("increment state counter")},
{NULL, NULL},
};
static int
Shoddy_init(Shoddy *self, PyObject *args, PyObject *kwds)
{
if (PyList_Type.tp_init((PyObject *)self, args, kwds) < 0)
return -1;
self->state = 0;
return 0;
}
static PyTypeObject ShoddyType = {
PyObject_HEAD_INIT(NULL)
0, /* ob_size */
"shoddy.Shoddy", /* tp_name */
sizeof(Shoddy), /* tp_basicsize */
0, /* tp_itemsize */
0, /* tp_dealloc */
0, /* tp_print */
0, /* tp_getattr */
0, /* tp_setattr */
0, /* tp_compare */
0, /* tp_repr */
0, /* tp_as_number */
0, /* tp_as_sequence */
0, /* tp_as_mapping */
0, /* tp_hash */
0, /* tp_call */
0, /* tp_str */
0, /* tp_getattro */
0, /* tp_setattro */
0, /* tp_as_buffer */
Py_TPFLAGS_DEFAULT |
Py_TPFLAGS_BASETYPE, /* tp_flags */
0, /* tp_doc */
0, /* tp_traverse */
0, /* tp_clear */
0, /* tp_richcompare */
0, /* tp_weaklistoffset */
0, /* tp_iter */
0, /* tp_iternext */
Shoddy_methods, /* tp_methods */
0, /* tp_members */
0, /* tp_getset */
0, /* tp_base */
0, /* tp_dict */
0, /* tp_descr_get */
0, /* tp_descr_set */
0, /* tp_dictoffset */
(initproc)Shoddy_init, /* tp_init */
0, /* tp_alloc */
0, /* tp_new */
};
PyMODINIT_FUNC
initshoddy(void)
{
PyObject *m;
ShoddyType.tp_base = &PyList_Type;
if (PyType_Ready(&ShoddyType) < 0)
return;
m = Py_InitModule3("shoddy", NULL, "Shoddy module");
if (m == NULL)
return;
Py_INCREF(&ShoddyType);
PyModule_AddObject(m, "Shoddy", (PyObject *) &ShoddyType);
}

View File

@ -1,213 +0,0 @@
"""Test module for the noddy examples
Noddy 1:
>>> import noddy
>>> n1 = noddy.Noddy()
>>> n2 = noddy.Noddy()
>>> del n1
>>> del n2
Noddy 2
>>> import noddy2
>>> n1 = noddy2.Noddy('jim', 'fulton', 42)
>>> n1.first
'jim'
>>> n1.last
'fulton'
>>> n1.number
42
>>> n1.name()
'jim fulton'
>>> n1.first = 'will'
>>> n1.name()
'will fulton'
>>> n1.last = 'tell'
>>> n1.name()
'will tell'
>>> del n1.first
>>> n1.name()
Traceback (most recent call last):
...
AttributeError: first
>>> n1.first
Traceback (most recent call last):
...
AttributeError: first
>>> n1.first = 'drew'
>>> n1.first
'drew'
>>> del n1.number
Traceback (most recent call last):
...
TypeError: can't delete numeric/char attribute
>>> n1.number=2
>>> n1.number
2
>>> n1.first = 42
>>> n1.name()
'42 tell'
>>> n2 = noddy2.Noddy()
>>> n2.name()
' '
>>> n2.first
''
>>> n2.last
''
>>> del n2.first
>>> n2.first
Traceback (most recent call last):
...
AttributeError: first
>>> n2.first
Traceback (most recent call last):
...
AttributeError: first
>>> n2.name()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: first
>>> n2.number
0
>>> n3 = noddy2.Noddy('jim', 'fulton', 'waaa')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: an integer is required
>>> del n1
>>> del n2
Noddy 3
>>> import noddy3
>>> n1 = noddy3.Noddy('jim', 'fulton', 42)
>>> n1 = noddy3.Noddy('jim', 'fulton', 42)
>>> n1.name()
'jim fulton'
>>> del n1.first
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: Cannot delete the first attribute
>>> n1.first = 42
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: The first attribute value must be a string
>>> n1.first = 'will'
>>> n1.name()
'will fulton'
>>> n2 = noddy3.Noddy()
>>> n2 = noddy3.Noddy()
>>> n2 = noddy3.Noddy()
>>> n3 = noddy3.Noddy('jim', 'fulton', 'waaa')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: an integer is required
>>> del n1
>>> del n2
Noddy 4
>>> import noddy4
>>> n1 = noddy4.Noddy('jim', 'fulton', 42)
>>> n1.first
'jim'
>>> n1.last
'fulton'
>>> n1.number
42
>>> n1.name()
'jim fulton'
>>> n1.first = 'will'
>>> n1.name()
'will fulton'
>>> n1.last = 'tell'
>>> n1.name()
'will tell'
>>> del n1.first
>>> n1.name()
Traceback (most recent call last):
...
AttributeError: first
>>> n1.first
Traceback (most recent call last):
...
AttributeError: first
>>> n1.first = 'drew'
>>> n1.first
'drew'
>>> del n1.number
Traceback (most recent call last):
...
TypeError: can't delete numeric/char attribute
>>> n1.number=2
>>> n1.number
2
>>> n1.first = 42
>>> n1.name()
'42 tell'
>>> n2 = noddy4.Noddy()
>>> n2 = noddy4.Noddy()
>>> n2 = noddy4.Noddy()
>>> n2 = noddy4.Noddy()
>>> n2.name()
' '
>>> n2.first
''
>>> n2.last
''
>>> del n2.first
>>> n2.first
Traceback (most recent call last):
...
AttributeError: first
>>> n2.first
Traceback (most recent call last):
...
AttributeError: first
>>> n2.name()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: first
>>> n2.number
0
>>> n3 = noddy4.Noddy('jim', 'fulton', 'waaa')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: an integer is required
Test cyclic gc(?)
>>> import gc
>>> gc.disable()
>>> x = []
>>> l = [x]
>>> n2.first = l
>>> n2.first
[[]]
>>> l.append(n2)
>>> del l
>>> del n1
>>> del n2
>>> sys.getrefcount(x)
3
>>> ignore = gc.collect()
>>> sys.getrefcount(x)
2
>>> gc.enable()
"""
import os
import sys
from distutils.util import get_platform
PLAT_SPEC = "%s-%s" % (get_platform(), sys.version[0:3])
src = os.path.join("build", "lib.%s" % PLAT_SPEC)
sys.path.append(src)
if __name__ == "__main__":
import doctest, __main__
doctest.testmod(__main__)

View File

@ -1,320 +0,0 @@
\chapter{Building C and \Cpp{} Extensions on Windows%
\label{building-on-windows}}
This chapter briefly explains how to create a Windows extension module
for Python using Microsoft Visual \Cpp, and follows with more
detailed background information on how it works. The explanatory
material is useful for both the Windows programmer learning to build
Python extensions and the \UNIX{} programmer interested in producing
software which can be successfully built on both \UNIX{} and Windows.
Module authors are encouraged to use the distutils approach for
building extension modules, instead of the one described in this
section. You will still need the C compiler that was used to build
Python; typically Microsoft Visual \Cpp.
\begin{notice}
This chapter mentions a number of filenames that include an encoded
Python version number. These filenames are represented with the
version number shown as \samp{XY}; in practive, \character{X} will
be the major version number and \character{Y} will be the minor
version number of the Python release you're working with. For
example, if you are using Python 2.2.1, \samp{XY} will actually be
\samp{22}.
\end{notice}
\section{A Cookbook Approach \label{win-cookbook}}
There are two approaches to building extension modules on Windows,
just as there are on \UNIX: use the
\ulink{\module{distutils}}{../lib/module-distutils.html} package to
control the build process, or do things manually. The distutils
approach works well for most extensions; documentation on using
\ulink{\module{distutils}}{../lib/module-distutils.html} to build and
package extension modules is available in
\citetitle[../dist/dist.html]{Distributing Python Modules}. This
section describes the manual approach to building Python extensions
written in C or \Cpp.
To build extensions using these instructions, you need to have a copy
of the Python sources of the same version as your installed Python.
You will need Microsoft Visual \Cpp{} ``Developer Studio''; project
files are supplied for V\Cpp{} version 7.1, but you can use older
versions of V\Cpp. Notice that you should use the same version of
V\Cpp that was used to build Python itself. The example files
described here are distributed with the Python sources in the
\file{PC\textbackslash example_nt\textbackslash} directory.
\begin{enumerate}
\item
\strong{Copy the example files}\\
The \file{example_nt} directory is a subdirectory of the \file{PC}
directory, in order to keep all the PC-specific files under the
same directory in the source distribution. However, the
\file{example_nt} directory can't actually be used from this
location. You first need to copy or move it up one level, so that
\file{example_nt} is a sibling of the \file{PC} and \file{Include}
directories. Do all your work from within this new location.
\item
\strong{Open the project}\\
From V\Cpp, use the \menuselection{File \sub Open Solution}
dialog (not \menuselection{File \sub Open}!). Navigate to and
select the file \file{example.sln}, in the \emph{copy} of the
\file{example_nt} directory you made above. Click Open.
\item
\strong{Build the example DLL}\\
In order to check that everything is set up right, try building:
\begin{enumerate}
\item
Select a configuration. This step is optional. Choose
\menuselection{Build \sub Configuration Manager \sub Active
Solution Configuration} and select either \guilabel{Release}
or\guilabel{Debug}. If you skip this step,
V\Cpp{} will use the Debug configuration by default.
\item
Build the DLL. Choose \menuselection{Build \sub Build
Solution}. This creates all intermediate and result files in
a subdirectory called either \file{Debug} or \file{Release},
depending on which configuration you selected in the preceding
step.
\end{enumerate}
\item
\strong{Testing the debug-mode DLL}\\
Once the Debug build has succeeded, bring up a DOS box, and change
to the \file{example_nt\textbackslash Debug} directory. You
should now be able to repeat the following session (\code{C>} is
the DOS prompt, \code{>>>} is the Python prompt; note that
build information and various debug output from Python may not
match this screen dump exactly):
\begin{verbatim}
C>..\..\PCbuild\python_d
Adding parser accelerators ...
Done.
Python 2.2 (#28, Dec 19 2001, 23:26:37) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
>>> import example
[4897 refs]
>>> example.foo()
Hello, world
[4903 refs]
>>>
\end{verbatim}
Congratulations! You've successfully built your first Python
extension module.
\item
\strong{Creating your own project}\\
Choose a name and create a directory for it. Copy your C sources
into it. Note that the module source file name does not
necessarily have to match the module name, but the name of the
initialization function should match the module name --- you can
only import a module \module{spam} if its initialization function
is called \cfunction{initspam()}, and it should call
\cfunction{Py_InitModule()} with the string \code{"spam"} as its
first argument (use the minimal \file{example.c} in this directory
as a guide). By convention, it lives in a file called
\file{spam.c} or \file{spammodule.c}. The output file should be
called \file{spam.dll} or \file{spam.pyd} (the latter is supported
to avoid confusion with a system library \file{spam.dll} to which
your module could be a Python interface) in Release mode, or
\file{spam_d.dll} or \file{spam_d.pyd} in Debug mode.
Now your options are:
\begin{enumerate}
\item Copy \file{example.sln} and \file{example.vcproj}, rename
them to \file{spam.*}, and edit them by hand, or
\item Create a brand new project; instructions are below.
\end{enumerate}
In either case, copy \file{example_nt\textbackslash example.def}
to \file{spam\textbackslash spam.def}, and edit the new
\file{spam.def} so its second line contains the string
`\code{initspam}'. If you created a new project yourself, add the
file \file{spam.def} to the project now. (This is an annoying
little file with only two lines. An alternative approach is to
forget about the \file{.def} file, and add the option
\programopt{/export:initspam} somewhere to the Link settings, by
manually editing the setting in Project Properties dialog).
\item
\strong{Creating a brand new project}\\
Use the \menuselection{File \sub New \sub Project} dialog to
create a new Project Workspace. Select \guilabel{Visual C++
Projects/Win32/ Win32 Project}, enter the name (\samp{spam}), and
make sure the Location is set to parent of the \file{spam}
directory you have created (which should be a direct subdirectory
of the Python build tree, a sibling of \file{Include} and
\file{PC}). Select Win32 as the platform (in my version, this is
the only choice). Make sure the Create new workspace radio button
is selected. Click OK.
You should now create the file \file{spam.def} as instructed in
the previous section. Add the source files to the project, using
\menuselection{Project \sub Add Existing Item}. Set the pattern to
\code{*.*} and select both \file{spam.c} and \file{spam.def} and
click OK. (Inserting them one by one is fine too.)
Now open the \menuselection{Project \sub spam properties} dialog.
You only need to change a few settings. Make sure \guilabel{All
Configurations} is selected from the \guilabel{Settings for:}
dropdown list. Select the C/\Cpp{} tab. Choose the General
category in the popup menu at the top. Type the following text in
the entry box labeled \guilabel{Additional Include Directories}:
\begin{verbatim}
..\Include,..\PC
\end{verbatim}
Then, choose the General category in the Linker tab, and enter
\begin{verbatim}
..\PCbuild
\end{verbatim}
in the text box labelled \guilabel{Additional library Directories}.
Now you need to add some mode-specific settings:
Select \guilabel{Release} in the \guilabel{Configuration}
dropdown list. Choose the \guilabel{Link} tab, choose the
\guilabel{Input} category, and append \code{pythonXY.lib} to the
list in the \guilabel{Additional Dependencies} box.
Select \guilabel{Debug} in the \guilabel{Configuration} dropdown
list, and append \code{pythonXY_d.lib} to the list in the
\guilabel{Additional Dependencies} box. Then click the C/\Cpp{}
tab, select \guilabel{Code Generation}, and select
\guilabel{Multi-threaded Debug DLL} from the \guilabel{Runtime
library} dropdown list.
Select \guilabel{Release} again from the \guilabel{Configuration}
dropdown list. Select \guilabel{Multi-threaded DLL} from the
\guilabel{Runtime library} dropdown list.
\end{enumerate}
If your module creates a new type, you may have trouble with this line:
\begin{verbatim}
PyObject_HEAD_INIT(&PyType_Type)
\end{verbatim}
Change it to:
\begin{verbatim}
PyObject_HEAD_INIT(NULL)
\end{verbatim}
and add the following to the module initialization function:
\begin{verbatim}
MyObject_Type.ob_type = &PyType_Type;
\end{verbatim}
Refer to section~3 of the
\citetitle[http://www.python.org/doc/FAQ.html]{Python FAQ} for details
on why you must do this.
\section{Differences Between \UNIX{} and Windows
\label{dynamic-linking}}
\sectionauthor{Chris Phoenix}{cphoenix@best.com}
\UNIX{} and Windows use completely different paradigms for run-time
loading of code. Before you try to build a module that can be
dynamically loaded, be aware of how your system works.
In \UNIX, a shared object (\file{.so}) file contains code to be used by the
program, and also the names of functions and data that it expects to
find in the program. When the file is joined to the program, all
references to those functions and data in the file's code are changed
to point to the actual locations in the program where the functions
and data are placed in memory. This is basically a link operation.
In Windows, a dynamic-link library (\file{.dll}) file has no dangling
references. Instead, an access to functions or data goes through a
lookup table. So the DLL code does not have to be fixed up at runtime
to refer to the program's memory; instead, the code already uses the
DLL's lookup table, and the lookup table is modified at runtime to
point to the functions and data.
In \UNIX, there is only one type of library file (\file{.a}) which
contains code from several object files (\file{.o}). During the link
step to create a shared object file (\file{.so}), the linker may find
that it doesn't know where an identifier is defined. The linker will
look for it in the object files in the libraries; if it finds it, it
will include all the code from that object file.
In Windows, there are two types of library, a static library and an
import library (both called \file{.lib}). A static library is like a
\UNIX{} \file{.a} file; it contains code to be included as necessary.
An import library is basically used only to reassure the linker that a
certain identifier is legal, and will be present in the program when
the DLL is loaded. So the linker uses the information from the
import library to build the lookup table for using identifiers that
are not included in the DLL. When an application or a DLL is linked,
an import library may be generated, which will need to be used for all
future DLLs that depend on the symbols in the application or DLL.
Suppose you are building two dynamic-load modules, B and C, which should
share another block of code A. On \UNIX, you would \emph{not} pass
\file{A.a} to the linker for \file{B.so} and \file{C.so}; that would
cause it to be included twice, so that B and C would each have their
own copy. In Windows, building \file{A.dll} will also build
\file{A.lib}. You \emph{do} pass \file{A.lib} to the linker for B and
C. \file{A.lib} does not contain code; it just contains information
which will be used at runtime to access A's code.
In Windows, using an import library is sort of like using \samp{import
spam}; it gives you access to spam's names, but does not create a
separate copy. On \UNIX, linking with a library is more like
\samp{from spam import *}; it does create a separate copy.
\section{Using DLLs in Practice \label{win-dlls}}
\sectionauthor{Chris Phoenix}{cphoenix@best.com}
Windows Python is built in Microsoft Visual \Cpp; using other
compilers may or may not work (though Borland seems to). The rest of
this section is MSV\Cpp{} specific.
When creating DLLs in Windows, you must pass \file{pythonXY.lib} to
the linker. To build two DLLs, spam and ni (which uses C functions
found in spam), you could use these commands:
\begin{verbatim}
cl /LD /I/python/include spam.c ../libs/pythonXY.lib
cl /LD /I/python/include ni.c spam.lib ../libs/pythonXY.lib
\end{verbatim}
The first command created three files: \file{spam.obj},
\file{spam.dll} and \file{spam.lib}. \file{Spam.dll} does not contain
any Python functions (such as \cfunction{PyArg_ParseTuple()}), but it
does know how to find the Python code thanks to \file{pythonXY.lib}.
The second command created \file{ni.dll} (and \file{.obj} and
\file{.lib}), which knows how to find the necessary functions from
spam, and also from the Python executable.
Not every identifier is exported to the lookup table. If you want any
other modules (including Python) to be able to see your identifiers,
you have to say \samp{_declspec(dllexport)}, as in \samp{void
_declspec(dllexport) initspam(void)} or \samp{PyObject
_declspec(dllexport) *NiGetSpamData(void)}.
Developer Studio will throw in a lot of import libraries that you do
not really need, adding about 100K to your executable. To get rid of
them, use the Project Settings dialog, Link tab, to specify
\emph{ignore default libraries}. Add the correct
\file{msvcrt\var{xx}.lib} to the list of libraries.

View File

@ -1,84 +0,0 @@
# Makefile for the HOWTO directory
# LaTeX HOWTOs can be turned into HTML, PDF, PS, DVI or plain text output.
# reST HOWTOs can only be turned into HTML.
# Variables to change
# Paper size for non-HTML formats (letter or a4)
PAPER=letter
# Arguments to rst2html.py, and location of the script
RSTARGS = --input-encoding=utf-8
RST2HTML = rst2html.py
# List of HOWTOs that aren't to be processed. This should contain the
# base name of the HOWTO without any extension (e.g. 'advocacy',
# 'unicode').
REMOVE_HOWTOS =
MKHOWTO=../tools/mkhowto
WEBDIR=.
PAPERDIR=../paper-$(PAPER)
HTMLDIR=../html
# Determine list of files to be built
TEX_SOURCES = $(wildcard *.tex)
RST_SOURCES = $(wildcard *.rst)
TEX_NAMES = $(filter-out $(REMOVE_HOWTOS),$(patsubst %.tex,%,$(TEX_SOURCES)))
PAPER_PATHS=$(addprefix $(PAPERDIR)/,$(TEX_NAMES))
DVI =$(addsuffix .dvi,$(PAPER_PATHS))
PDF =$(addsuffix .pdf,$(PAPER_PATHS))
PS =$(addsuffix .ps,$(PAPER_PATHS))
ALL_HOWTO_NAMES = $(TEX_NAMES) $(patsubst %.rst,%,$(RST_SOURCES))
HOWTO_NAMES = $(filter-out $(REMOVE_HOWTOS),$(ALL_HOWTO_NAMES))
HTML = $(addprefix $(HTMLDIR)/,$(HOWTO_NAMES))
# Rules for building various formats
# reST to HTML
$(HTMLDIR)/%: %.rst
if [ ! -d $@ ] ; then mkdir $@ ; fi
$(RST2HTML) $(RSTARGS) $< >$@/index.html
# LaTeX to various output formats
$(PAPERDIR)/%.dvi : %.tex
$(MKHOWTO) --dvi $<
mv $*.dvi $@
$(PAPERDIR)/%.pdf : %.tex
$(MKHOWTO) --pdf $<
mv $*.pdf $@
$(PAPERDIR)/%.ps : %.tex
$(MKHOWTO) --ps $<
mv $*.ps $@
$(HTMLDIR)/% : %.tex
$(MKHOWTO) --html --iconserver="." --dir $@ $<
# Rule that isn't actually used -- we no longer support the 'txt' target.
$(PAPERDIR)/%.txt : %.tex
$(MKHOWTO) --text $<
mv $@ txt
default:
@echo "'all' -- build all files"
@echo "'dvi', 'pdf', 'ps', 'html' -- build one format"
all: dvi pdf ps html
.PHONY : dvi pdf ps html
dvi: $(DVI)
pdf: $(PDF)
ps: $(PS)
html: $(HTML)
clean:
rm -f *~ *.log *.ind *.l2h *.aux *.toc *.how *.bkm
rm -f *.dvi *.pdf *.ps
clobber:
rm -rf $(HTML)
rm -rf $(DVI) $(PDF) $(PS)

View File

@ -1,13 +0,0 @@
Short-term tasks:
Quick revision pass to make HOWTOs match the current state of Python
doanddont regex sockets
Medium-term tasks:
Revisit the regex howto.
* Add exercises with answers for each section
* More examples?
Long-term tasks:
Integrate with other Python docs?

View File

@ -1,411 +0,0 @@
\documentclass{howto}
\title{Python Advocacy HOWTO}
\release{0.03}
\author{A.M. Kuchling}
\authoraddress{\email{amk@amk.ca}}
\begin{document}
\maketitle
\begin{abstract}
\noindent
It's usually difficult to get your management to accept open source
software, and Python is no exception to this rule. This document
discusses reasons to use Python, strategies for winning acceptance,
facts and arguments you can use, and cases where you \emph{shouldn't}
try to use Python.
This document is available from the Python HOWTO page at
\url{http://www.python.org/doc/howto}.
\end{abstract}
\tableofcontents
\section{Reasons to Use Python}
There are several reasons to incorporate a scripting language into
your development process, and this section will discuss them, and why
Python has some properties that make it a particularly good choice.
\subsection{Programmability}
Programs are often organized in a modular fashion. Lower-level
operations are grouped together, and called by higher-level functions,
which may in turn be used as basic operations by still further upper
levels.
For example, the lowest level might define a very low-level
set of functions for accessing a hash table. The next level might use
hash tables to store the headers of a mail message, mapping a header
name like \samp{Date} to a value such as \samp{Tue, 13 May 1997
20:00:54 -0400}. A yet higher level may operate on message objects,
without knowing or caring that message headers are stored in a hash
table, and so forth.
Often, the lowest levels do very simple things; they implement a data
structure such as a binary tree or hash table, or they perform some
simple computation, such as converting a date string to a number. The
higher levels then contain logic connecting these primitive
operations. Using the approach, the primitives can be seen as basic
building blocks which are then glued together to produce the complete
product.
Why is this design approach relevant to Python? Because Python is
well suited to functioning as such a glue language. A common approach
is to write a Python module that implements the lower level
operations; for the sake of speed, the implementation might be in C,
Java, or even Fortran. Once the primitives are available to Python
programs, the logic underlying higher level operations is written in
the form of Python code. The high-level logic is then more
understandable, and easier to modify.
John Ousterhout wrote a paper that explains this idea at greater
length, entitled ``Scripting: Higher Level Programming for the 21st
Century''. I recommend that you read this paper; see the references
for the URL. Ousterhout is the inventor of the Tcl language, and
therefore argues that Tcl should be used for this purpose; he only
briefly refers to other languages such as Python, Perl, and
Lisp/Scheme, but in reality, Ousterhout's argument applies to
scripting languages in general, since you could equally write
extensions for any of the languages mentioned above.
\subsection{Prototyping}
In \emph{The Mythical Man-Month}, Fredrick Brooks suggests the
following rule when planning software projects: ``Plan to throw one
away; you will anyway.'' Brooks is saying that the first attempt at a
software design often turns out to be wrong; unless the problem is
very simple or you're an extremely good designer, you'll find that new
requirements and features become apparent once development has
actually started. If these new requirements can't be cleanly
incorporated into the program's structure, you're presented with two
unpleasant choices: hammer the new features into the program somehow,
or scrap everything and write a new version of the program, taking the
new features into account from the beginning.
Python provides you with a good environment for quickly developing an
initial prototype. That lets you get the overall program structure
and logic right, and you can fine-tune small details in the fast
development cycle that Python provides. Once you're satisfied with
the GUI interface or program output, you can translate the Python code
into C++, Fortran, Java, or some other compiled language.
Prototyping means you have to be careful not to use too many Python
features that are hard to implement in your other language. Using
\code{eval()}, or regular expressions, or the \module{pickle} module,
means that you're going to need C or Java libraries for formula
evaluation, regular expressions, and serialization, for example. But
it's not hard to avoid such tricky code, and in the end the
translation usually isn't very difficult. The resulting code can be
rapidly debugged, because any serious logical errors will have been
removed from the prototype, leaving only more minor slip-ups in the
translation to track down.
This strategy builds on the earlier discussion of programmability.
Using Python as glue to connect lower-level components has obvious
relevance for constructing prototype systems. In this way Python can
help you with development, even if end users never come in contact
with Python code at all. If the performance of the Python version is
adequate and corporate politics allow it, you may not need to do a
translation into C or Java, but it can still be faster to develop a
prototype and then translate it, instead of attempting to produce the
final version immediately.
One example of this development strategy is Microsoft Merchant Server.
Version 1.0 was written in pure Python, by a company that subsequently
was purchased by Microsoft. Version 2.0 began to translate the code
into \Cpp, shipping with some \Cpp code and some Python code. Version
3.0 didn't contain any Python at all; all the code had been translated
into \Cpp. Even though the product doesn't contain a Python
interpreter, the Python language has still served a useful purpose by
speeding up development.
This is a very common use for Python. Past conference papers have
also described this approach for developing high-level numerical
algorithms; see David M. Beazley and Peter S. Lomdahl's paper
``Feeding a Large-scale Physics Application to Python'' in the
references for a good example. If an algorithm's basic operations are
things like "Take the inverse of this 4000x4000 matrix", and are
implemented in some lower-level language, then Python has almost no
additional performance cost; the extra time required for Python to
evaluate an expression like \code{m.invert()} is dwarfed by the cost
of the actual computation. It's particularly good for applications
where seemingly endless tweaking is required to get things right. GUI
interfaces and Web sites are prime examples.
The Python code is also shorter and faster to write (once you're
familiar with Python), so it's easier to throw it away if you decide
your approach was wrong; if you'd spent two weeks working on it
instead of just two hours, you might waste time trying to patch up
what you've got out of a natural reluctance to admit that those two
weeks were wasted. Truthfully, those two weeks haven't been wasted,
since you've learnt something about the problem and the technology
you're using to solve it, but it's human nature to view this as a
failure of some sort.
\subsection{Simplicity and Ease of Understanding}
Python is definitely \emph{not} a toy language that's only usable for
small tasks. The language features are general and powerful enough to
enable it to be used for many different purposes. It's useful at the
small end, for 10- or 20-line scripts, but it also scales up to larger
systems that contain thousands of lines of code.
However, this expressiveness doesn't come at the cost of an obscure or
tricky syntax. While Python has some dark corners that can lead to
obscure code, there are relatively few such corners, and proper design
can isolate their use to only a few classes or modules. It's
certainly possible to write confusing code by using too many features
with too little concern for clarity, but most Python code can look a
lot like a slightly-formalized version of human-understandable
pseudocode.
In \emph{The New Hacker's Dictionary}, Eric S. Raymond gives the following
definition for "compact":
\begin{quotation}
Compact \emph{adj.} Of a design, describes the valuable property
that it can all be apprehended at once in one's head. This
generally means the thing created from the design can be used
with greater facility and fewer errors than an equivalent tool
that is not compact. Compactness does not imply triviality or
lack of power; for example, C is compact and FORTRAN is not,
but C is more powerful than FORTRAN. Designs become
non-compact through accreting features and cruft that don't
merge cleanly into the overall design scheme (thus, some fans
of Classic C maintain that ANSI C is no longer compact).
\end{quotation}
(From \url{http://www.catb.org/~esr/jargon/html/C/compact.html})
In this sense of the word, Python is quite compact, because the
language has just a few ideas, which are used in lots of places. Take
namespaces, for example. Import a module with \code{import math}, and
you create a new namespace called \samp{math}. Classes are also
namespaces that share many of the properties of modules, and have a
few of their own; for example, you can create instances of a class.
Instances? They're yet another namespace. Namespaces are currently
implemented as Python dictionaries, so they have the same methods as
the standard dictionary data type: .keys() returns all the keys, and
so forth.
This simplicity arises from Python's development history. The
language syntax derives from different sources; ABC, a relatively
obscure teaching language, is one primary influence, and Modula-3 is
another. (For more information about ABC and Modula-3, consult their
respective Web sites at \url{http://www.cwi.nl/~steven/abc/} and
\url{http://www.m3.org}.) Other features have come from C, Icon,
Algol-68, and even Perl. Python hasn't really innovated very much,
but instead has tried to keep the language small and easy to learn,
building on ideas that have been tried in other languages and found
useful.
Simplicity is a virtue that should not be underestimated. It lets you
learn the language more quickly, and then rapidly write code, code
that often works the first time you run it.
\subsection{Java Integration}
If you're working with Java, Jython
(\url{http://www.jython.org/}) is definitely worth your
attention. Jython is a re-implementation of Python in Java that
compiles Python code into Java bytecodes. The resulting environment
has very tight, almost seamless, integration with Java. It's trivial
to access Java classes from Python, and you can write Python classes
that subclass Java classes. Jython can be used for prototyping Java
applications in much the same way CPython is used, and it can also be
used for test suites for Java code, or embedded in a Java application
to add scripting capabilities.
\section{Arguments and Rebuttals}
Let's say that you've decided upon Python as the best choice for your
application. How can you convince your management, or your fellow
developers, to use Python? This section lists some common arguments
against using Python, and provides some possible rebuttals.
\emph{Python is freely available software that doesn't cost anything.
How good can it be?}
Very good, indeed. These days Linux and Apache, two other pieces of
open source software, are becoming more respected as alternatives to
commercial software, but Python hasn't had all the publicity.
Python has been around for several years, with many users and
developers. Accordingly, the interpreter has been used by many
people, and has gotten most of the bugs shaken out of it. While bugs
are still discovered at intervals, they're usually either quite
obscure (they'd have to be, for no one to have run into them before)
or they involve interfaces to external libraries. The internals of
the language itself are quite stable.
Having the source code should be viewed as making the software
available for peer review; people can examine the code, suggest (and
implement) improvements, and track down bugs. To find out more about
the idea of open source code, along with arguments and case studies
supporting it, go to \url{http://www.opensource.org}.
\emph{Who's going to support it?}
Python has a sizable community of developers, and the number is still
growing. The Internet community surrounding the language is an active
one, and is worth being considered another one of Python's advantages.
Most questions posted to the comp.lang.python newsgroup are quickly
answered by someone.
Should you need to dig into the source code, you'll find it's clear
and well-organized, so it's not very difficult to write extensions and
track down bugs yourself. If you'd prefer to pay for support, there
are companies and individuals who offer commercial support for Python.
\emph{Who uses Python for serious work?}
Lots of people; one interesting thing about Python is the surprising
diversity of applications that it's been used for. People are using
Python to:
\begin{itemize}
\item Run Web sites
\item Write GUI interfaces
\item Control
number-crunching code on supercomputers
\item Make a commercial application scriptable by embedding the Python
interpreter inside it
\item Process large XML data sets
\item Build test suites for C or Java code
\end{itemize}
Whatever your application domain is, there's probably someone who's
used Python for something similar. Yet, despite being useable for
such high-end applications, Python's still simple enough to use for
little jobs.
See \url{http://wiki.python.org/moin/OrganizationsUsingPython} for a list of some of the
organizations that use Python.
\emph{What are the restrictions on Python's use?}
They're practically nonexistent. Consult the \file{Misc/COPYRIGHT}
file in the source distribution, or
\url{http://www.python.org/doc/Copyright.html} for the full language,
but it boils down to three conditions.
\begin{itemize}
\item You have to leave the copyright notice on the software; if you
don't include the source code in a product, you have to put the
copyright notice in the supporting documentation.
\item Don't claim that the institutions that have developed Python
endorse your product in any way.
\item If something goes wrong, you can't sue for damages. Practically
all software licences contain this condition.
\end{itemize}
Notice that you don't have to provide source code for anything that
contains Python or is built with it. Also, the Python interpreter and
accompanying documentation can be modified and redistributed in any
way you like, and you don't have to pay anyone any licensing fees at
all.
\emph{Why should we use an obscure language like Python instead of
well-known language X?}
I hope this HOWTO, and the documents listed in the final section, will
help convince you that Python isn't obscure, and has a healthily
growing user base. One word of advice: always present Python's
positive advantages, instead of concentrating on language X's
failings. People want to know why a solution is good, rather than why
all the other solutions are bad. So instead of attacking a competing
solution on various grounds, simply show how Python's virtues can
help.
\section{Useful Resources}
\begin{definitions}
\term{\url{http://www.pythonology.com/success}}
The Python Success Stories are a collection of stories from successful
users of Python, with the emphasis on business and corporate users.
%\term{\url{http://www.fsbassociates.com/books/pythonchpt1.htm}}
%The first chapter of \emph{Internet Programming with Python} also
%examines some of the reasons for using Python. The book is well worth
%buying, but the publishers have made the first chapter available on
%the Web.
\term{\url{http://home.pacbell.net/ouster/scripting.html}}
John Ousterhout's white paper on scripting is a good argument for the
utility of scripting languages, though naturally enough, he emphasizes
Tcl, the language he developed. Most of the arguments would apply to
any scripting language.
\term{\url{http://www.python.org/workshops/1997-10/proceedings/beazley.html}}
The authors, David M. Beazley and Peter S. Lomdahl,
describe their use of Python at Los Alamos National Laboratory.
It's another good example of how Python can help get real work done.
This quotation from the paper has been echoed by many people:
\begin{quotation}
Originally developed as a large monolithic application for
massively parallel processing systems, we have used Python to
transform our application into a flexible, highly modular, and
extremely powerful system for performing simulation, data
analysis, and visualization. In addition, we describe how Python
has solved a number of important problems related to the
development, debugging, deployment, and maintenance of scientific
software.
\end{quotation}
\term{\url{http://pythonjournal.cognizor.com/pyj1/Everitt-Feit_interview98-V1.html}}
This interview with Andy Feit, discussing Infoseek's use of Python, can be
used to show that choosing Python didn't introduce any difficulties
into a company's development process, and provided some substantial benefits.
%\term{\url{http://www.python.org/psa/Commercial.html}}
%Robin Friedrich wrote this document on how to support Python's use in
%commercial projects.
\term{\url{http://www.python.org/workshops/1997-10/proceedings/stein.ps}}
For the 6th Python conference, Greg Stein presented a paper that
traced Python's adoption and usage at a startup called eShop, and
later at Microsoft.
\term{\url{http://www.opensource.org}}
Management may be doubtful of the reliability and usefulness of
software that wasn't written commercially. This site presents
arguments that show how open source software can have considerable
advantages over closed-source software.
\term{\url{http://sunsite.unc.edu/LDP/HOWTO/mini/Advocacy.html}}
The Linux Advocacy mini-HOWTO was the inspiration for this document,
and is also well worth reading for general suggestions on winning
acceptance for a new technology, such as Linux or Python. In general,
you won't make much progress by simply attacking existing systems and
complaining about their inadequacies; this often ends up looking like
unfocused whining. It's much better to point out some of the many
areas where Python is an improvement over other systems.
\end{definitions}
\end{document}

View File

@ -1,486 +0,0 @@
\documentclass{howto}
\title{Curses Programming with Python}
\release{2.02}
\author{A.M. Kuchling, Eric S. Raymond}
\authoraddress{\email{amk@amk.ca}, \email{esr@thyrsus.com}}
\begin{document}
\maketitle
\begin{abstract}
\noindent
This document describes how to write text-mode programs with Python 2.x,
using the \module{curses} extension module to control the display.
This document is available from the Python HOWTO page at
\url{http://www.python.org/doc/howto}.
\end{abstract}
\tableofcontents
\section{What is curses?}
The curses library supplies a terminal-independent screen-painting and
keyboard-handling facility for text-based terminals; such terminals
include VT100s, the Linux console, and the simulated terminal provided
by X11 programs such as xterm and rxvt. Display terminals support
various control codes to perform common operations such as moving the
cursor, scrolling the screen, and erasing areas. Different terminals
use widely differing codes, and often have their own minor quirks.
In a world of X displays, one might ask ``why bother''? It's true
that character-cell display terminals are an obsolete technology, but
there are niches in which being able to do fancy things with them are
still valuable. One is on small-footprint or embedded Unixes that
don't carry an X server. Another is for tools like OS installers
and kernel configurators that may have to run before X is available.
The curses library hides all the details of different terminals, and
provides the programmer with an abstraction of a display, containing
multiple non-overlapping windows. The contents of a window can be
changed in various ways--adding text, erasing it, changing its
appearance--and the curses library will automagically figure out what
control codes need to be sent to the terminal to produce the right
output.
The curses library was originally written for BSD Unix; the later System V
versions of Unix from AT\&T added many enhancements and new functions.
BSD curses is no longer maintained, having been replaced by ncurses,
which is an open-source implementation of the AT\&T interface. If you're
using an open-source Unix such as Linux or FreeBSD, your system almost
certainly uses ncurses. Since most current commercial Unix versions
are based on System V code, all the functions described here will
probably be available. The older versions of curses carried by some
proprietary Unixes may not support everything, though.
No one has made a Windows port of the curses module. On a Windows
platform, try the Console module written by Fredrik Lundh. The
Console module provides cursor-addressable text output, plus full
support for mouse and keyboard input, and is available from
\url{http://effbot.org/efflib/console}.
\subsection{The Python curses module}
Thy Python module is a fairly simple wrapper over the C functions
provided by curses; if you're already familiar with curses programming
in C, it's really easy to transfer that knowledge to Python. The
biggest difference is that the Python interface makes things simpler,
by merging different C functions such as \function{addstr},
\function{mvaddstr}, \function{mvwaddstr}, into a single
\method{addstr()} method. You'll see this covered in more detail
later.
This HOWTO is simply an introduction to writing text-mode programs
with curses and Python. It doesn't attempt to be a complete guide to
the curses API; for that, see the Python library guide's section on
ncurses, and the C manual pages for ncurses. It will, however, give
you the basic ideas.
\section{Starting and ending a curses application}
Before doing anything, curses must be initialized. This is done by
calling the \function{initscr()} function, which will determine the
terminal type, send any required setup codes to the terminal, and
create various internal data structures. If successful,
\function{initscr()} returns a window object representing the entire
screen; this is usually called \code{stdscr}, after the name of the
corresponding C
variable.
\begin{verbatim}
import curses
stdscr = curses.initscr()
\end{verbatim}
Usually curses applications turn off automatic echoing of keys to the
screen, in order to be able to read keys and only display them under
certain circumstances. This requires calling the \function{noecho()}
function.
\begin{verbatim}
curses.noecho()
\end{verbatim}
Applications will also commonly need to react to keys instantly,
without requiring the Enter key to be pressed; this is called cbreak
mode, as opposed to the usual buffered input mode.
\begin{verbatim}
curses.cbreak()
\end{verbatim}
Terminals usually return special keys, such as the cursor keys or
navigation keys such as Page Up and Home, as a multibyte escape
sequence. While you could write your application to expect such
sequences and process them accordingly, curses can do it for you,
returning a special value such as \constant{curses.KEY_LEFT}. To get
curses to do the job, you'll have to enable keypad mode.
\begin{verbatim}
stdscr.keypad(1)
\end{verbatim}
Terminating a curses application is much easier than starting one.
You'll need to call
\begin{verbatim}
curses.nocbreak(); stdscr.keypad(0); curses.echo()
\end{verbatim}
to reverse the curses-friendly terminal settings. Then call the
\function{endwin()} function to restore the terminal to its original
operating mode.
\begin{verbatim}
curses.endwin()
\end{verbatim}
A common problem when debugging a curses application is to get your
terminal messed up when the application dies without restoring the
terminal to its previous state. In Python this commonly happens when
your code is buggy and raises an uncaught exception. Keys are no
longer be echoed to the screen when you type them, for example, which
makes using the shell difficult.
In Python you can avoid these complications and make debugging much
easier by importing the module \module{curses.wrapper}. It supplies a
\function{wrapper()} function that takes a callable. It does the
initializations described above, and also initializes colors if color
support is present. It then runs your provided callable and finally
deinitializes appropriately. The callable is called inside a try-catch
clause which catches exceptions, performs curses deinitialization, and
then passes the exception upwards. Thus, your terminal won't be left
in a funny state on exception.
\section{Windows and Pads}
Windows are the basic abstraction in curses. A window object
represents a rectangular area of the screen, and supports various
methods to display text, erase it, allow the user to input strings,
and so forth.
The \code{stdscr} object returned by the \function{initscr()} function
is a window object that covers the entire screen. Many programs may
need only this single window, but you might wish to divide the screen
into smaller windows, in order to redraw or clear them separately.
The \function{newwin()} function creates a new window of a given size,
returning the new window object.
\begin{verbatim}
begin_x = 20 ; begin_y = 7
height = 5 ; width = 40
win = curses.newwin(height, width, begin_y, begin_x)
\end{verbatim}
A word about the coordinate system used in curses: coordinates are
always passed in the order \emph{y,x}, and the top-left corner of a
window is coordinate (0,0). This breaks a common convention for
handling coordinates, where the \emph{x} coordinate usually comes
first. This is an unfortunate difference from most other computer
applications, but it's been part of curses since it was first written,
and it's too late to change things now.
When you call a method to display or erase text, the effect doesn't
immediately show up on the display. This is because curses was
originally written with slow 300-baud terminal connections in mind;
with these terminals, minimizing the time required to redraw the
screen is very important. This lets curses accumulate changes to the
screen, and display them in the most efficient manner. For example,
if your program displays some characters in a window, and then clears
the window, there's no need to send the original characters because
they'd never be visible.
Accordingly, curses requires that you explicitly tell it to redraw
windows, using the \function{refresh()} method of window objects. In
practice, this doesn't really complicate programming with curses much.
Most programs go into a flurry of activity, and then pause waiting for
a keypress or some other action on the part of the user. All you have
to do is to be sure that the screen has been redrawn before pausing to
wait for user input, by simply calling \code{stdscr.refresh()} or the
\function{refresh()} method of some other relevant window.
A pad is a special case of a window; it can be larger than the actual
display screen, and only a portion of it displayed at a time.
Creating a pad simply requires the pad's height and width, while
refreshing a pad requires giving the coordinates of the on-screen
area where a subsection of the pad will be displayed.
\begin{verbatim}
pad = curses.newpad(100, 100)
# These loops fill the pad with letters; this is
# explained in the next section
for y in range(0, 100):
for x in range(0, 100):
try: pad.addch(y,x, ord('a') + (x*x+y*y) % 26 )
except curses.error: pass
# Displays a section of the pad in the middle of the screen
pad.refresh( 0,0, 5,5, 20,75)
\end{verbatim}
The \function{refresh()} call displays a section of the pad in the
rectangle extending from coordinate (5,5) to coordinate (20,75) on the
screen; the upper left corner of the displayed section is coordinate
(0,0) on the pad. Beyond that difference, pads are exactly like
ordinary windows and support the same methods.
If you have multiple windows and pads on screen there is a more
efficient way to go, which will prevent annoying screen flicker at
refresh time. Use the \method{noutrefresh()} method
of each window to update the data structure
representing the desired state of the screen; then change the physical
screen to match the desired state in one go with the function
\function{doupdate()}. The normal \method{refresh()} method calls
\function{doupdate()} as its last act.
\section{Displaying Text}
{}From a C programmer's point of view, curses may sometimes look like
a twisty maze of functions, all subtly different. For example,
\function{addstr()} displays a string at the current cursor location
in the \code{stdscr} window, while \function{mvaddstr()} moves to a
given y,x coordinate first before displaying the string.
\function{waddstr()} is just like \function{addstr()}, but allows
specifying a window to use, instead of using \code{stdscr} by default.
\function{mvwaddstr()} follows similarly.
Fortunately the Python interface hides all these details;
\code{stdscr} is a window object like any other, and methods like
\function{addstr()} accept multiple argument forms. Usually there are
four different forms.
\begin{tableii}{|c|l|}{textrm}{Form}{Description}
\lineii{\var{str} or \var{ch}}{Display the string \var{str} or
character \var{ch} at the current position}
\lineii{\var{str} or \var{ch}, \var{attr}}{Display the string \var{str} or
character \var{ch}, using attribute \var{attr} at the current position}
\lineii{\var{y}, \var{x}, \var{str} or \var{ch}}
{Move to position \var{y,x} within the window, and display \var{str}
or \var{ch}}
\lineii{\var{y}, \var{x}, \var{str} or \var{ch}, \var{attr}}
{Move to position \var{y,x} within the window, and display \var{str}
or \var{ch}, using attribute \var{attr}}
\end{tableii}
Attributes allow displaying text in highlighted forms, such as in
boldface, underline, reverse code, or in color. They'll be explained
in more detail in the next subsection.
The \function{addstr()} function takes a Python string as the value to
be displayed, while the \function{addch()} functions take a character,
which can be either a Python string of length 1 or an integer. If
it's a string, you're limited to displaying characters between 0 and
255. SVr4 curses provides constants for extension characters; these
constants are integers greater than 255. For example,
\constant{ACS_PLMINUS} is a +/- symbol, and \constant{ACS_ULCORNER} is
the upper left corner of a box (handy for drawing borders).
Windows remember where the cursor was left after the last operation,
so if you leave out the \var{y,x} coordinates, the string or character
will be displayed wherever the last operation left off. You can also
move the cursor with the \function{move(\var{y,x})} method. Because
some terminals always display a flashing cursor, you may want to
ensure that the cursor is positioned in some location where it won't
be distracting; it can be confusing to have the cursor blinking at
some apparently random location.
If your application doesn't need a blinking cursor at all, you can
call \function{curs_set(0)} to make it invisible. Equivalently, and
for compatibility with older curses versions, there's a
\function{leaveok(\var{bool})} function. When \var{bool} is true, the
curses library will attempt to suppress the flashing cursor, and you
won't need to worry about leaving it in odd locations.
\subsection{Attributes and Color}
Characters can be displayed in different ways. Status lines in a
text-based application are commonly shown in reverse video; a text
viewer may need to highlight certain words. curses supports this by
allowing you to specify an attribute for each cell on the screen.
An attribute is a integer, each bit representing a different
attribute. You can try to display text with multiple attribute bits
set, but curses doesn't guarantee that all the possible combinations
are available, or that they're all visually distinct. That depends on
the ability of the terminal being used, so it's safest to stick to the
most commonly available attributes, listed here.
\begin{tableii}{|c|l|}{constant}{Attribute}{Description}
\lineii{A_BLINK}{Blinking text}
\lineii{A_BOLD}{Extra bright or bold text}
\lineii{A_DIM}{Half bright text}
\lineii{A_REVERSE}{Reverse-video text}
\lineii{A_STANDOUT}{The best highlighting mode available}
\lineii{A_UNDERLINE}{Underlined text}
\end{tableii}
So, to display a reverse-video status line on the top line of the
screen,
you could code:
\begin{verbatim}
stdscr.addstr(0, 0, "Current mode: Typing mode",
curses.A_REVERSE)
stdscr.refresh()
\end{verbatim}
The curses library also supports color on those terminals that
provide it, The most common such terminal is probably the Linux
console, followed by color xterms.
To use color, you must call the \function{start_color()} function soon
after calling \function{initscr()}, to initialize the default color
set (the \function{curses.wrapper.wrapper()} function does this
automatically). Once that's done, the \function{has_colors()}
function returns TRUE if the terminal in use can actually display
color. (Note: curses uses the American spelling 'color', instead of
the Canadian/British spelling 'colour'. If you're used to the British
spelling, you'll have to resign yourself to misspelling it for the
sake of these functions.)
The curses library maintains a finite number of color pairs,
containing a foreground (or text) color and a background color. You
can get the attribute value corresponding to a color pair with the
\function{color_pair()} function; this can be bitwise-OR'ed with other
attributes such as \constant{A_REVERSE}, but again, such combinations
are not guaranteed to work on all terminals.
An example, which displays a line of text using color pair 1:
\begin{verbatim}
stdscr.addstr( "Pretty text", curses.color_pair(1) )
stdscr.refresh()
\end{verbatim}
As I said before, a color pair consists of a foreground and
background color. \function{start_color()} initializes 8 basic
colors when it activates color mode. They are: 0:black, 1:red,
2:green, 3:yellow, 4:blue, 5:magenta, 6:cyan, and 7:white. The curses
module defines named constants for each of these colors:
\constant{curses.COLOR_BLACK}, \constant{curses.COLOR_RED}, and so
forth.
The \function{init_pair(\var{n, f, b})} function changes the
definition of color pair \var{n}, to foreground color {f} and
background color {b}. Color pair 0 is hard-wired to white on black,
and cannot be changed.
Let's put all this together. To change color 1 to red
text on a white background, you would call:
\begin{verbatim}
curses.init_pair(1, curses.COLOR_RED, curses.COLOR_WHITE)
\end{verbatim}
When you change a color pair, any text already displayed using that
color pair will change to the new colors. You can also display new
text in this color with:
\begin{verbatim}
stdscr.addstr(0,0, "RED ALERT!", curses.color_pair(1) )
\end{verbatim}
Very fancy terminals can change the definitions of the actual colors
to a given RGB value. This lets you change color 1, which is usually
red, to purple or blue or any other color you like. Unfortunately,
the Linux console doesn't support this, so I'm unable to try it out,
and can't provide any examples. You can check if your terminal can do
this by calling \function{can_change_color()}, which returns TRUE if
the capability is there. If you're lucky enough to have such a
talented terminal, consult your system's man pages for more
information.
\section{User Input}
The curses library itself offers only very simple input mechanisms.
Python's support adds a text-input widget that makes up some of the
lack.
The most common way to get input to a window is to use its
\method{getch()} method. \method{getch()} pauses and waits for the
user to hit a key, displaying it if \function{echo()} has been called
earlier. You can optionally specify a coordinate to which the cursor
should be moved before pausing.
It's possible to change this behavior with the method
\method{nodelay()}. After \method{nodelay(1)}, \method{getch()} for
the window becomes non-blocking and returns \code{curses.ERR} (a value
of -1) when no input is ready. There's also a \function{halfdelay()}
function, which can be used to (in effect) set a timer on each
\method{getch()}; if no input becomes available within the number of
milliseconds specified as the argument to \function{halfdelay()},
curses raises an exception.
The \method{getch()} method returns an integer; if it's between 0 and
255, it represents the ASCII code of the key pressed. Values greater
than 255 are special keys such as Page Up, Home, or the cursor keys.
You can compare the value returned to constants such as
\constant{curses.KEY_PPAGE}, \constant{curses.KEY_HOME}, or
\constant{curses.KEY_LEFT}. Usually the main loop of your program
will look something like this:
\begin{verbatim}
while 1:
c = stdscr.getch()
if c == ord('p'): PrintDocument()
elif c == ord('q'): break # Exit the while()
elif c == curses.KEY_HOME: x = y = 0
\end{verbatim}
The \module{curses.ascii} module supplies ASCII class membership
functions that take either integer or 1-character-string
arguments; these may be useful in writing more readable tests for
your command interpreters. It also supplies conversion functions
that take either integer or 1-character-string arguments and return
the same type. For example, \function{curses.ascii.ctrl()} returns
the control character corresponding to its argument.
There's also a method to retrieve an entire string,
\constant{getstr()}. It isn't used very often, because its
functionality is quite limited; the only editing keys available are
the backspace key and the Enter key, which terminates the string. It
can optionally be limited to a fixed number of characters.
\begin{verbatim}
curses.echo() # Enable echoing of characters
# Get a 15-character string, with the cursor on the top line
s = stdscr.getstr(0,0, 15)
\end{verbatim}
The Python \module{curses.textpad} module supplies something better.
With it, you can turn a window into a text box that supports an
Emacs-like set of keybindings. Various methods of \class{Textbox}
class support editing with input validation and gathering the edit
results either with or without trailing spaces. See the library
documentation on \module{curses.textpad} for the details.
\section{For More Information}
This HOWTO didn't cover some advanced topics, such as screen-scraping
or capturing mouse events from an xterm instance. But the Python
library page for the curses modules is now pretty complete. You
should browse it next.
If you're in doubt about the detailed behavior of any of the ncurses
entry points, consult the manual pages for your curses implementation,
whether it's ncurses or a proprietary Unix vendor's. The manual pages
will document any quirks, and provide complete lists of all the
functions, attributes, and \constant{ACS_*} characters available to
you.
Because the curses API is so large, some functions aren't supported in
the Python interface, not because they're difficult to implement, but
because no one has needed them yet. Feel free to add them and then
submit a patch. Also, we don't yet have support for the menus or
panels libraries associated with ncurses; feel free to add that.
If you write an interesting little program, feel free to contribute it
as another demo. We can always use more of them!
The ncurses FAQ: \url{http://dickey.his.com/ncurses/ncurses.faq.html}
\end{document}

View File

@ -1,344 +0,0 @@
\documentclass{howto}
\title{Idioms and Anti-Idioms in Python}
\release{0.00}
\author{Moshe Zadka}
\authoraddress{howto@zadka.site.co.il}
\begin{document}
\maketitle
This document is placed in the public doman.
\begin{abstract}
\noindent
This document can be considered a companion to the tutorial. It
shows how to use Python, and even more importantly, how {\em not}
to use Python.
\end{abstract}
\tableofcontents
\section{Language Constructs You Should Not Use}
While Python has relatively few gotchas compared to other languages, it
still has some constructs which are only useful in corner cases, or are
plain dangerous.
\subsection{from module import *}
\subsubsection{Inside Function Definitions}
\code{from module import *} is {\em invalid} inside function definitions.
While many versions of Python do not check for the invalidity, it does not
make it more valid, no more then having a smart lawyer makes a man innocent.
Do not use it like that ever. Even in versions where it was accepted, it made
the function execution slower, because the compiler could not be certain
which names are local and which are global. In Python 2.1 this construct
causes warnings, and sometimes even errors.
\subsubsection{At Module Level}
While it is valid to use \code{from module import *} at module level it
is usually a bad idea. For one, this loses an important property Python
otherwise has --- you can know where each toplevel name is defined by
a simple "search" function in your favourite editor. You also open yourself
to trouble in the future, if some module grows additional functions or
classes.
One of the most awful question asked on the newsgroup is why this code:
\begin{verbatim}
f = open("www")
f.read()
\end{verbatim}
does not work. Of course, it works just fine (assuming you have a file
called "www".) But it does not work if somewhere in the module, the
statement \code{from os import *} is present. The \module{os} module
has a function called \function{open()} which returns an integer. While
it is very useful, shadowing builtins is one of its least useful properties.
Remember, you can never know for sure what names a module exports, so either
take what you need --- \code{from module import name1, name2}, or keep them in
the module and access on a per-need basis ---
\code{import module;print module.name}.
\subsubsection{When It Is Just Fine}
There are situations in which \code{from module import *} is just fine:
\begin{itemize}
\item The interactive prompt. For example, \code{from math import *} makes
Python an amazing scientific calculator.
\item When extending a module in C with a module in Python.
\item When the module advertises itself as \code{from import *} safe.
\end{itemize}
\subsection{Unadorned \keyword{exec}, \function{execfile} and friends}
The word ``unadorned'' refers to the use without an explicit dictionary,
in which case those constructs evaluate code in the {\em current} environment.
This is dangerous for the same reasons \code{from import *} is dangerous ---
it might step over variables you are counting on and mess up things for
the rest of your code. Simply do not do that.
Bad examples:
\begin{verbatim}
>>> for name in sys.argv[1:]:
>>> exec "%s=1" % name
>>> def func(s, **kw):
>>> for var, val in kw.items():
>>> exec "s.%s=val" % var # invalid!
>>> execfile("handler.py")
>>> handle()
\end{verbatim}
Good examples:
\begin{verbatim}
>>> d = {}
>>> for name in sys.argv[1:]:
>>> d[name] = 1
>>> def func(s, **kw):
>>> for var, val in kw.items():
>>> setattr(s, var, val)
>>> d={}
>>> execfile("handle.py", d, d)
>>> handle = d['handle']
>>> handle()
\end{verbatim}
\subsection{from module import name1, name2}
This is a ``don't'' which is much weaker then the previous ``don't''s
but is still something you should not do if you don't have good reasons
to do that. The reason it is usually bad idea is because you suddenly
have an object which lives in two seperate namespaces. When the binding
in one namespace changes, the binding in the other will not, so there
will be a discrepancy between them. This happens when, for example,
one module is reloaded, or changes the definition of a function at runtime.
Bad example:
\begin{verbatim}
# foo.py
a = 1
# bar.py
from foo import a
if something():
a = 2 # danger: foo.a != a
\end{verbatim}
Good example:
\begin{verbatim}
# foo.py
a = 1
# bar.py
import foo
if something():
foo.a = 2
\end{verbatim}
\subsection{except:}
Python has the \code{except:} clause, which catches all exceptions.
Since {\em every} error in Python raises an exception, this makes many
programming errors look like runtime problems, and hinders
the debugging process.
The following code shows a great example:
\begin{verbatim}
try:
foo = opne("file") # misspelled "open"
except:
sys.exit("could not open file!")
\end{verbatim}
The second line triggers a \exception{NameError} which is caught by the
except clause. The program will exit, and you will have no idea that
this has nothing to do with the readability of \code{"file"}.
The example above is better written
\begin{verbatim}
try:
foo = opne("file") # will be changed to "open" as soon as we run it
except IOError:
sys.exit("could not open file")
\end{verbatim}
There are some situations in which the \code{except:} clause is useful:
for example, in a framework when running callbacks, it is good not to
let any callback disturb the framework.
\section{Exceptions}
Exceptions are a useful feature of Python. You should learn to raise
them whenever something unexpected occurs, and catch them only where
you can do something about them.
The following is a very popular anti-idiom
\begin{verbatim}
def get_status(file):
if not os.path.exists(file):
print "file not found"
sys.exit(1)
return open(file).readline()
\end{verbatim}
Consider the case the file gets deleted between the time the call to
\function{os.path.exists} is made and the time \function{open} is called.
That means the last line will throw an \exception{IOError}. The same would
happen if \var{file} exists but has no read permission. Since testing this
on a normal machine on existing and non-existing files make it seem bugless,
that means in testing the results will seem fine, and the code will get
shipped. Then an unhandled \exception{IOError} escapes to the user, who
has to watch the ugly traceback.
Here is a better way to do it.
\begin{verbatim}
def get_status(file):
try:
return open(file).readline()
except (IOError, OSError):
print "file not found"
sys.exit(1)
\end{verbatim}
In this version, *either* the file gets opened and the line is read
(so it works even on flaky NFS or SMB connections), or the message
is printed and the application aborted.
Still, \function{get_status} makes too many assumptions --- that it
will only be used in a short running script, and not, say, in a long
running server. Sure, the caller could do something like
\begin{verbatim}
try:
status = get_status(log)
except SystemExit:
status = None
\end{verbatim}
So, try to make as few \code{except} clauses in your code --- those will
usually be a catch-all in the \function{main}, or inside calls which
should always succeed.
So, the best version is probably
\begin{verbatim}
def get_status(file):
return open(file).readline()
\end{verbatim}
The caller can deal with the exception if it wants (for example, if it
tries several files in a loop), or just let the exception filter upwards
to {\em its} caller.
The last version is not very good either --- due to implementation details,
the file would not be closed when an exception is raised until the handler
finishes, and perhaps not at all in non-C implementations (e.g., Jython).
\begin{verbatim}
def get_status(file):
fp = open(file)
try:
return fp.readline()
finally:
fp.close()
\end{verbatim}
\section{Using the Batteries}
Every so often, people seem to be writing stuff in the Python library
again, usually poorly. While the occasional module has a poor interface,
it is usually much better to use the rich standard library and data
types that come with Python then inventing your own.
A useful module very few people know about is \module{os.path}. It
always has the correct path arithmetic for your operating system, and
will usually be much better then whatever you come up with yourself.
Compare:
\begin{verbatim}
# ugh!
return dir+"/"+file
# better
return os.path.join(dir, file)
\end{verbatim}
More useful functions in \module{os.path}: \function{basename},
\function{dirname} and \function{splitext}.
There are also many useful builtin functions people seem not to be
aware of for some reason: \function{min()} and \function{max()} can
find the minimum/maximum of any sequence with comparable semantics,
for example, yet many people write their own
\function{max()}/\function{min()}. Another highly useful function is
\function{reduce()}. A classical use of \function{reduce()}
is something like
\begin{verbatim}
import sys, operator
nums = map(float, sys.argv[1:])
print reduce(operator.add, nums)/len(nums)
\end{verbatim}
This cute little script prints the average of all numbers given on the
command line. The \function{reduce()} adds up all the numbers, and
the rest is just some pre- and postprocessing.
On the same note, note that \function{float()}, \function{int()} and
\function{long()} all accept arguments of type string, and so are
suited to parsing --- assuming you are ready to deal with the
\exception{ValueError} they raise.
\section{Using Backslash to Continue Statements}
Since Python treats a newline as a statement terminator,
and since statements are often more then is comfortable to put
in one line, many people do:
\begin{verbatim}
if foo.bar()['first'][0] == baz.quux(1, 2)[5:9] and \
calculate_number(10, 20) != forbulate(500, 360):
pass
\end{verbatim}
You should realize that this is dangerous: a stray space after the
\code{\\} would make this line wrong, and stray spaces are notoriously
hard to see in editors. In this case, at least it would be a syntax
error, but if the code was:
\begin{verbatim}
value = foo.bar()['first'][0]*baz.quux(1, 2)[5:9] \
+ calculate_number(10, 20)*forbulate(500, 360)
\end{verbatim}
then it would just be subtly wrong.
It is usually much better to use the implicit continuation inside parenthesis:
This version is bulletproof:
\begin{verbatim}
value = (foo.bar()['first'][0]*baz.quux(1, 2)[5:9]
+ calculate_number(10, 20)*forbulate(500, 360))
\end{verbatim}
\end{document}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,465 +0,0 @@
\documentclass{howto}
\title{Socket Programming HOWTO}
\release{0.00}
\author{Gordon McMillan}
\authoraddress{\email{gmcm@hypernet.com}}
\begin{document}
\maketitle
\begin{abstract}
\noindent
Sockets are used nearly everywhere, but are one of the most severely
misunderstood technologies around. This is a 10,000 foot overview of
sockets. It's not really a tutorial - you'll still have work to do in
getting things operational. It doesn't cover the fine points (and there
are a lot of them), but I hope it will give you enough background to
begin using them decently.
This document is available from the Python HOWTO page at
\url{http://www.python.org/doc/howto}.
\end{abstract}
\tableofcontents
\section{Sockets}
Sockets are used nearly everywhere, but are one of the most severely
misunderstood technologies around. This is a 10,000 foot overview of
sockets. It's not really a tutorial - you'll still have work to do in
getting things working. It doesn't cover the fine points (and there
are a lot of them), but I hope it will give you enough background to
begin using them decently.
I'm only going to talk about INET sockets, but they account for at
least 99\% of the sockets in use. And I'll only talk about STREAM
sockets - unless you really know what you're doing (in which case this
HOWTO isn't for you!), you'll get better behavior and performance from
a STREAM socket than anything else. I will try to clear up the mystery
of what a socket is, as well as some hints on how to work with
blocking and non-blocking sockets. But I'll start by talking about
blocking sockets. You'll need to know how they work before dealing
with non-blocking sockets.
Part of the trouble with understanding these things is that "socket"
can mean a number of subtly different things, depending on context. So
first, let's make a distinction between a "client" socket - an
endpoint of a conversation, and a "server" socket, which is more like
a switchboard operator. The client application (your browser, for
example) uses "client" sockets exclusively; the web server it's
talking to uses both "server" sockets and "client" sockets.
\subsection{History}
Of the various forms of IPC (\emph{Inter Process Communication}),
sockets are by far the most popular. On any given platform, there are
likely to be other forms of IPC that are faster, but for
cross-platform communication, sockets are about the only game in town.
They were invented in Berkeley as part of the BSD flavor of Unix. They
spread like wildfire with the Internet. With good reason --- the
combination of sockets with INET makes talking to arbitrary machines
around the world unbelievably easy (at least compared to other
schemes).
\section{Creating a Socket}
Roughly speaking, when you clicked on the link that brought you to
this page, your browser did something like the following:
\begin{verbatim}
#create an INET, STREAMing socket
s = socket.socket(
socket.AF_INET, socket.SOCK_STREAM)
#now connect to the web server on port 80
# - the normal http port
s.connect(("www.mcmillan-inc.com", 80))
\end{verbatim}
When the \code{connect} completes, the socket \code{s} can
now be used to send in a request for the text of this page. The same
socket will read the reply, and then be destroyed. That's right -
destroyed. Client sockets are normally only used for one exchange (or
a small set of sequential exchanges).
What happens in the web server is a bit more complex. First, the web
server creates a "server socket".
\begin{verbatim}
#create an INET, STREAMing socket
serversocket = socket.socket(
socket.AF_INET, socket.SOCK_STREAM)
#bind the socket to a public host,
# and a well-known port
serversocket.bind((socket.gethostname(), 80))
#become a server socket
serversocket.listen(5)
\end{verbatim}
A couple things to notice: we used \code{socket.gethostname()}
so that the socket would be visible to the outside world. If we had
used \code{s.bind(('', 80))} or \code{s.bind(('localhost',
80))} or \code{s.bind(('127.0.0.1', 80))} we would still
have a "server" socket, but one that was only visible within the same
machine.
A second thing to note: low number ports are usually reserved for
"well known" services (HTTP, SNMP etc). If you're playing around, use
a nice high number (4 digits).
Finally, the argument to \code{listen} tells the socket library that
we want it to queue up as many as 5 connect requests (the normal max)
before refusing outside connections. If the rest of the code is
written properly, that should be plenty.
OK, now we have a "server" socket, listening on port 80. Now we enter
the mainloop of the web server:
\begin{verbatim}
while 1:
#accept connections from outside
(clientsocket, address) = serversocket.accept()
#now do something with the clientsocket
#in this case, we'll pretend this is a threaded server
ct = client_thread(clientsocket)
ct.run()
\end{verbatim}
There's actually 3 general ways in which this loop could work -
dispatching a thread to handle \code{clientsocket}, create a new
process to handle \code{clientsocket}, or restructure this app
to use non-blocking sockets, and mulitplex between our "server" socket
and any active \code{clientsocket}s using
\code{select}. More about that later. The important thing to
understand now is this: this is \emph{all} a "server" socket
does. It doesn't send any data. It doesn't receive any data. It just
produces "client" sockets. Each \code{clientsocket} is created
in response to some \emph{other} "client" socket doing a
\code{connect()} to the host and port we're bound to. As soon as
we've created that \code{clientsocket}, we go back to listening
for more connections. The two "clients" are free to chat it up - they
are using some dynamically allocated port which will be recycled when
the conversation ends.
\subsection{IPC} If you need fast IPC between two processes
on one machine, you should look into whatever form of shared memory
the platform offers. A simple protocol based around shared memory and
locks or semaphores is by far the fastest technique.
If you do decide to use sockets, bind the "server" socket to
\code{'localhost'}. On most platforms, this will take a shortcut
around a couple of layers of network code and be quite a bit faster.
\section{Using a Socket}
The first thing to note, is that the web browser's "client" socket and
the web server's "client" socket are identical beasts. That is, this
is a "peer to peer" conversation. Or to put it another way, \emph{as the
designer, you will have to decide what the rules of etiquette are for
a conversation}. Normally, the \code{connect}ing socket
starts the conversation, by sending in a request, or perhaps a
signon. But that's a design decision - it's not a rule of sockets.
Now there are two sets of verbs to use for communication. You can use
\code{send} and \code{recv}, or you can transform your
client socket into a file-like beast and use \code{read} and
\code{write}. The latter is the way Java presents their
sockets. I'm not going to talk about it here, except to warn you that
you need to use \code{flush} on sockets. These are buffered
"files", and a common mistake is to \code{write} something, and
then \code{read} for a reply. Without a \code{flush} in
there, you may wait forever for the reply, because the request may
still be in your output buffer.
Now we come the major stumbling block of sockets - \code{send}
and \code{recv} operate on the network buffers. They do not
necessarily handle all the bytes you hand them (or expect from them),
because their major focus is handling the network buffers. In general,
they return when the associated network buffers have been filled
(\code{send}) or emptied (\code{recv}). They then tell you
how many bytes they handled. It is \emph{your} responsibility to call
them again until your message has been completely dealt with.
When a \code{recv} returns 0 bytes, it means the other side has
closed (or is in the process of closing) the connection. You will not
receive any more data on this connection. Ever. You may be able to
send data successfully; I'll talk about that some on the next page.
A protocol like HTTP uses a socket for only one transfer. The client
sends a request, the reads a reply. That's it. The socket is
discarded. This means that a client can detect the end of the reply by
receiving 0 bytes.
But if you plan to reuse your socket for further transfers, you need
to realize that \emph{there is no "EOT" (End of Transfer) on a
socket.} I repeat: if a socket \code{send} or
\code{recv} returns after handling 0 bytes, the connection has
been broken. If the connection has \emph{not} been broken, you may
wait on a \code{recv} forever, because the socket will
\emph{not} tell you that there's nothing more to read (for now). Now
if you think about that a bit, you'll come to realize a fundamental
truth of sockets: \emph{messages must either be fixed length} (yuck),
\emph{or be delimited} (shrug), \emph{or indicate how long they are}
(much better), \emph{or end by shutting down the connection}. The
choice is entirely yours, (but some ways are righter than others).
Assuming you don't want to end the connection, the simplest solution
is a fixed length message:
\begin{verbatim}
class mysocket:
'''demonstration class only
- coded for clarity, not efficiency
'''
def __init__(self, sock=None):
if sock is None:
self.sock = socket.socket(
socket.AF_INET, socket.SOCK_STREAM)
else:
self.sock = sock
def connect(self, host, port):
self.sock.connect((host, port))
def mysend(self, msg):
totalsent = 0
while totalsent < MSGLEN:
sent = self.sock.send(msg[totalsent:])
if sent == 0:
raise RuntimeError, \\
"socket connection broken"
totalsent = totalsent + sent
def myreceive(self):
msg = ''
while len(msg) < MSGLEN:
chunk = self.sock.recv(MSGLEN-len(msg))
if chunk == '':
raise RuntimeError, \\
"socket connection broken"
msg = msg + chunk
return msg
\end{verbatim}
The sending code here is usable for almost any messaging scheme - in
Python you send strings, and you can use \code{len()} to
determine its length (even if it has embedded \code{\e 0}
characters). It's mostly the receiving code that gets more
complex. (And in C, it's not much worse, except you can't use
\code{strlen} if the message has embedded \code{\e 0}s.)
The easiest enhancement is to make the first character of the message
an indicator of message type, and have the type determine the
length. Now you have two \code{recv}s - the first to get (at
least) that first character so you can look up the length, and the
second in a loop to get the rest. If you decide to go the delimited
route, you'll be receiving in some arbitrary chunk size, (4096 or 8192
is frequently a good match for network buffer sizes), and scanning
what you've received for a delimiter.
One complication to be aware of: if your conversational protocol
allows multiple messages to be sent back to back (without some kind of
reply), and you pass \code{recv} an arbitrary chunk size, you
may end up reading the start of a following message. You'll need to
put that aside and hold onto it, until it's needed.
Prefixing the message with it's length (say, as 5 numeric characters)
gets more complex, because (believe it or not), you may not get all 5
characters in one \code{recv}. In playing around, you'll get
away with it; but in high network loads, your code will very quickly
break unless you use two \code{recv} loops - the first to
determine the length, the second to get the data part of the
message. Nasty. This is also when you'll discover that
\code{send} does not always manage to get rid of everything in
one pass. And despite having read this, you will eventually get bit by
it!
In the interests of space, building your character, (and preserving my
competitive position), these enhancements are left as an exercise for
the reader. Lets move on to cleaning up.
\subsection{Binary Data}
It is perfectly possible to send binary data over a socket. The major
problem is that not all machines use the same formats for binary
data. For example, a Motorola chip will represent a 16 bit integer
with the value 1 as the two hex bytes 00 01. Intel and DEC, however,
are byte-reversed - that same 1 is 01 00. Socket libraries have calls
for converting 16 and 32 bit integers - \code{ntohl, htonl, ntohs,
htons} where "n" means \emph{network} and "h" means \emph{host},
"s" means \emph{short} and "l" means \emph{long}. Where network order
is host order, these do nothing, but where the machine is
byte-reversed, these swap the bytes around appropriately.
In these days of 32 bit machines, the ascii representation of binary
data is frequently smaller than the binary representation. That's
because a surprising amount of the time, all those longs have the
value 0, or maybe 1. The string "0" would be two bytes, while binary
is four. Of course, this doesn't fit well with fixed-length
messages. Decisions, decisions.
\section{Disconnecting}
Strictly speaking, you're supposed to use \code{shutdown} on a
socket before you \code{close} it. The \code{shutdown} is
an advisory to the socket at the other end. Depending on the argument
you pass it, it can mean "I'm not going to send anymore, but I'll
still listen", or "I'm not listening, good riddance!". Most socket
libraries, however, are so used to programmers neglecting to use this
piece of etiquette that normally a \code{close} is the same as
\code{shutdown(); close()}. So in most situations, an explicit
\code{shutdown} is not needed.
One way to use \code{shutdown} effectively is in an HTTP-like
exchange. The client sends a request and then does a
\code{shutdown(1)}. This tells the server "This client is done
sending, but can still receive." The server can detect "EOF" by a
receive of 0 bytes. It can assume it has the complete request. The
server sends a reply. If the \code{send} completes successfully
then, indeed, the client was still receiving.
Python takes the automatic shutdown a step further, and says that when a socket is garbage collected, it will automatically do a \code{close} if it's needed. But relying on this is a very bad habit. If your socket just disappears without doing a \code{close}, the socket at the other end may hang indefinitely, thinking you're just being slow. \emph{Please} \code{close} your sockets when you're done.
\subsection{When Sockets Die}
Probably the worst thing about using blocking sockets is what happens
when the other side comes down hard (without doing a
\code{close}). Your socket is likely to hang. SOCKSTREAM is a
reliable protocol, and it will wait a long, long time before giving up
on a connection. If you're using threads, the entire thread is
essentially dead. There's not much you can do about it. As long as you
aren't doing something dumb, like holding a lock while doing a
blocking read, the thread isn't really consuming much in the way of
resources. Do \emph{not} try to kill the thread - part of the reason
that threads are more efficient than processes is that they avoid the
overhead associated with the automatic recycling of resources. In
other words, if you do manage to kill the thread, your whole process
is likely to be screwed up.
\section{Non-blocking Sockets}
If you've understood the preceeding, you already know most of what you
need to know about the mechanics of using sockets. You'll still use
the same calls, in much the same ways. It's just that, if you do it
right, your app will be almost inside-out.
In Python, you use \code{socket.setblocking(0)} to make it
non-blocking. In C, it's more complex, (for one thing, you'll need to
choose between the BSD flavor \code{O_NONBLOCK} and the almost
indistinguishable Posix flavor \code{O_NDELAY}, which is
completely different from \code{TCP_NODELAY}), but it's the
exact same idea. You do this after creating the socket, but before
using it. (Actually, if you're nuts, you can switch back and forth.)
The major mechanical difference is that \code{send},
\code{recv}, \code{connect} and \code{accept} can
return without having done anything. You have (of course) a number of
choices. You can check return code and error codes and generally drive
yourself crazy. If you don't believe me, try it sometime. Your app
will grow large, buggy and suck CPU. So let's skip the brain-dead
solutions and do it right.
Use \code{select}.
In C, coding \code{select} is fairly complex. In Python, it's a
piece of cake, but it's close enough to the C version that if you
understand \code{select} in Python, you'll have little trouble
with it in C.
\begin{verbatim} ready_to_read, ready_to_write, in_error = \\
select.select(
potential_readers,
potential_writers,
potential_errs,
timeout)
\end{verbatim}
You pass \code{select} three lists: the first contains all
sockets that you might want to try reading; the second all the sockets
you might want to try writing to, and the last (normally left empty)
those that you want to check for errors. You should note that a
socket can go into more than one list. The \code{select} call is
blocking, but you can give it a timeout. This is generally a sensible
thing to do - give it a nice long timeout (say a minute) unless you
have good reason to do otherwise.
In return, you will get three lists. They have the sockets that are
actually readable, writable and in error. Each of these lists is a
subset (possbily empty) of the corresponding list you passed in. And
if you put a socket in more than one input list, it will only be (at
most) in one output list.
If a socket is in the output readable list, you can be
as-close-to-certain-as-we-ever-get-in-this-business that a
\code{recv} on that socket will return \emph{something}. Same
idea for the writable list. You'll be able to send
\emph{something}. Maybe not all you want to, but \emph{something} is
better than nothing. (Actually, any reasonably healthy socket will
return as writable - it just means outbound network buffer space is
available.)
If you have a "server" socket, put it in the potential_readers
list. If it comes out in the readable list, your \code{accept}
will (almost certainly) work. If you have created a new socket to
\code{connect} to someone else, put it in the ptoential_writers
list. If it shows up in the writable list, you have a decent chance
that it has connected.
One very nasty problem with \code{select}: if somewhere in those
input lists of sockets is one which has died a nasty death, the
\code{select} will fail. You then need to loop through every
single damn socket in all those lists and do a
\code{select([sock],[],[],0)} until you find the bad one. That
timeout of 0 means it won't take long, but it's ugly.
Actually, \code{select} can be handy even with blocking sockets.
It's one way of determining whether you will block - the socket
returns as readable when there's something in the buffers. However,
this still doesn't help with the problem of determining whether the
other end is done, or just busy with something else.
\textbf{Portability alert}: On Unix, \code{select} works both with
the sockets and files. Don't try this on Windows. On Windows,
\code{select} works with sockets only. Also note that in C, many
of the more advanced socket options are done differently on
Windows. In fact, on Windows I usually use threads (which work very,
very well) with my sockets. Face it, if you want any kind of
performance, your code will look very different on Windows than on
Unix. (I haven't the foggiest how you do this stuff on a Mac.)
\subsection{Performance}
There's no question that the fastest sockets code uses non-blocking
sockets and select to multiplex them. You can put together something
that will saturate a LAN connection without putting any strain on the
CPU. The trouble is that an app written this way can't do much of
anything else - it needs to be ready to shuffle bytes around at all
times.
Assuming that your app is actually supposed to do something more than
that, threading is the optimal solution, (and using non-blocking
sockets will be faster than using blocking sockets). Unfortunately,
threading support in Unixes varies both in API and quality. So the
normal Unix solution is to fork a subprocess to deal with each
connection. The overhead for this is significant (and don't do this on
Windows - the overhead of process creation is enormous there). It also
means that unless each subprocess is completely independent, you'll
need to use another form of IPC, say a pipe, or shared memory and
semaphores, to communicate between the parent and child processes.
Finally, remember that even though blocking sockets are somewhat
slower than non-blocking, in many cases they are the "right"
solution. After all, if your app is driven by the data it receives
over a socket, there's not much sense in complicating the logic just
so your app can wait on \code{select} instead of
\code{recv}.
\end{document}

View File

@ -1,766 +0,0 @@
Unicode HOWTO
================
**Version 1.02**
This HOWTO discusses Python's support for Unicode, and explains various
problems that people commonly encounter when trying to work with Unicode.
Introduction to Unicode
------------------------------
History of Character Codes
''''''''''''''''''''''''''''''
In 1968, the American Standard Code for Information Interchange,
better known by its acronym ASCII, was standardized. ASCII defined
numeric codes for various characters, with the numeric values running from 0 to
127. For example, the lowercase letter 'a' is assigned 97 as its code
value.
ASCII was an American-developed standard, so it only defined
unaccented characters. There was an 'e', but no 'é' or 'Í'. This
meant that languages which required accented characters couldn't be
faithfully represented in ASCII. (Actually the missing accents matter
for English, too, which contains words such as 'naïve' and 'café', and some
publications have house styles which require spellings such as
'coöperate'.)
For a while people just wrote programs that didn't display accents. I
remember looking at Apple ][ BASIC programs, published in French-language
publications in the mid-1980s, that had lines like these::
PRINT "FICHER EST COMPLETE."
PRINT "CARACTERE NON ACCEPTE."
Those messages should contain accents, and they just look wrong to
someone who can read French.
In the 1980s, almost all personal computers were 8-bit, meaning that
bytes could hold values ranging from 0 to 255. ASCII codes only went
up to 127, so some machines assigned values between 128 and 255 to
accented characters. Different machines had different codes, however,
which led to problems exchanging files. Eventually various commonly
used sets of values for the 128-255 range emerged. Some were true
standards, defined by the International Standards Organization, and
some were **de facto** conventions that were invented by one company
or another and managed to catch on.
255 characters aren't very many. For example, you can't fit
both the accented characters used in Western Europe and the Cyrillic
alphabet used for Russian into the 128-255 range because there are more than
127 such characters.
You could write files using different codes (all your Russian
files in a coding system called KOI8, all your French files in
a different coding system called Latin1), but what if you wanted
to write a French document that quotes some Russian text? In the
1980s people began to want to solve this problem, and the Unicode
standardization effort began.
Unicode started out using 16-bit characters instead of 8-bit characters. 16
bits means you have 2^16 = 65,536 distinct values available, making it
possible to represent many different characters from many different
alphabets; an initial goal was to have Unicode contain the alphabets for
every single human language. It turns out that even 16 bits isn't enough to
meet that goal, and the modern Unicode specification uses a wider range of
codes, 0-1,114,111 (0x10ffff in base-16).
There's a related ISO standard, ISO 10646. Unicode and ISO 10646 were
originally separate efforts, but the specifications were merged with
the 1.1 revision of Unicode.
(This discussion of Unicode's history is highly simplified. I don't
think the average Python programmer needs to worry about the
historical details; consult the Unicode consortium site listed in the
References for more information.)
Definitions
''''''''''''''''''''''''
A **character** is the smallest possible component of a text. 'A',
'B', 'C', etc., are all different characters. So are 'È' and
'Í'. Characters are abstractions, and vary depending on the
language or context you're talking about. For example, the symbol for
ohms (Ω) is usually drawn much like the capital letter
omega (Ω) in the Greek alphabet (they may even be the same in
some fonts), but these are two different characters that have
different meanings.
The Unicode standard describes how characters are represented by
**code points**. A code point is an integer value, usually denoted in
base 16. In the standard, a code point is written using the notation
U+12ca to mean the character with value 0x12ca (4810 decimal). The
Unicode standard contains a lot of tables listing characters and their
corresponding code points::
0061 'a'; LATIN SMALL LETTER A
0062 'b'; LATIN SMALL LETTER B
0063 'c'; LATIN SMALL LETTER C
...
007B '{'; LEFT CURLY BRACKET
Strictly, these definitions imply that it's meaningless to say 'this is
character U+12ca'. U+12ca is a code point, which represents some particular
character; in this case, it represents the character 'ETHIOPIC SYLLABLE WI'.
In informal contexts, this distinction between code points and characters will
sometimes be forgotten.
A character is represented on a screen or on paper by a set of graphical
elements that's called a **glyph**. The glyph for an uppercase A, for
example, is two diagonal strokes and a horizontal stroke, though the exact
details will depend on the font being used. Most Python code doesn't need
to worry about glyphs; figuring out the correct glyph to display is
generally the job of a GUI toolkit or a terminal's font renderer.
Encodings
'''''''''
To summarize the previous section:
a Unicode string is a sequence of code points, which are
numbers from 0 to 0x10ffff. This sequence needs to be represented as
a set of bytes (meaning, values from 0-255) in memory. The rules for
translating a Unicode string into a sequence of bytes are called an
**encoding**.
The first encoding you might think of is an array of 32-bit integers.
In this representation, the string "Python" would look like this::
P y t h o n
0x50 00 00 00 79 00 00 00 74 00 00 00 68 00 00 00 6f 00 00 00 6e 00 00 00
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
This representation is straightforward but using
it presents a number of problems.
1. It's not portable; different processors order the bytes
differently.
2. It's very wasteful of space. In most texts, the majority of the code
points are less than 127, or less than 255, so a lot of space is occupied
by zero bytes. The above string takes 24 bytes compared to the 6
bytes needed for an ASCII representation. Increased RAM usage doesn't
matter too much (desktop computers have megabytes of RAM, and strings
aren't usually that large), but expanding our usage of disk and
network bandwidth by a factor of 4 is intolerable.
3. It's not compatible with existing C functions such as ``strlen()``,
so a new family of wide string functions would need to be used.
4. Many Internet standards are defined in terms of textual data, and
can't handle content with embedded zero bytes.
Generally people don't use this encoding, instead choosing other encodings
that are more efficient and convenient.
Encodings don't have to handle every possible Unicode character, and
most encodings don't. For example, Python's default encoding is the
'ascii' encoding. The rules for converting a Unicode string into the
ASCII encoding are simple; for each code point:
1. If the code point is <128, each byte is the same as the value of the
code point.
2. If the code point is 128 or greater, the Unicode string can't
be represented in this encoding. (Python raises a
``UnicodeEncodeError`` exception in this case.)
Latin-1, also known as ISO-8859-1, is a similar encoding. Unicode
code points 0-255 are identical to the Latin-1 values, so converting
to this encoding simply requires converting code points to byte
values; if a code point larger than 255 is encountered, the string
can't be encoded into Latin-1.
Encodings don't have to be simple one-to-one mappings like Latin-1.
Consider IBM's EBCDIC, which was used on IBM mainframes. Letter
values weren't in one block: 'a' through 'i' had values from 129 to
137, but 'j' through 'r' were 145 through 153. If you wanted to use
EBCDIC as an encoding, you'd probably use some sort of lookup table to
perform the conversion, but this is largely an internal detail.
UTF-8 is one of the most commonly used encodings. UTF stands for
"Unicode Transformation Format", and the '8' means that 8-bit numbers
are used in the encoding. (There's also a UTF-16 encoding, but it's
less frequently used than UTF-8.) UTF-8 uses the following rules:
1. If the code point is <128, it's represented by the corresponding byte value.
2. If the code point is between 128 and 0x7ff, it's turned into two byte values
between 128 and 255.
3. Code points >0x7ff are turned into three- or four-byte sequences, where
each byte of the sequence is between 128 and 255.
UTF-8 has several convenient properties:
1. It can handle any Unicode code point.
2. A Unicode string is turned into a string of bytes containing no embedded zero bytes. This avoids byte-ordering issues, and means UTF-8 strings can be processed by C functions such as ``strcpy()`` and sent through protocols that can't handle zero bytes.
3. A string of ASCII text is also valid UTF-8 text.
4. UTF-8 is fairly compact; the majority of code points are turned into two bytes, and values less than 128 occupy only a single byte.
5. If bytes are corrupted or lost, it's possible to determine the start of the next UTF-8-encoded code point and resynchronize. It's also unlikely that random 8-bit data will look like valid UTF-8.
References
''''''''''''''
The Unicode Consortium site at <http://www.unicode.org> has character
charts, a glossary, and PDF versions of the Unicode specification. Be
prepared for some difficult reading.
<http://www.unicode.org/history/> is a chronology of the origin and
development of Unicode.
To help understand the standard, Jukka Korpela has written an
introductory guide to reading the Unicode character tables,
available at <http://www.cs.tut.fi/~jkorpela/unicode/guide.html>.
Roman Czyborra wrote another explanation of Unicode's basic principles;
it's at <http://czyborra.com/unicode/characters.html>.
Czyborra has written a number of other Unicode-related documentation,
available from <http://www.cyzborra.com>.
Two other good introductory articles were written by Joel Spolsky
<http://www.joelonsoftware.com/articles/Unicode.html> and Jason
Orendorff <http://www.jorendorff.com/articles/unicode/>. If this
introduction didn't make things clear to you, you should try reading
one of these alternate articles before continuing.
Wikipedia entries are often helpful; see the entries for "character
encoding" <http://en.wikipedia.org/wiki/Character_encoding> and UTF-8
<http://en.wikipedia.org/wiki/UTF-8>, for example.
Python's Unicode Support
------------------------
Now that you've learned the rudiments of Unicode, we can look at
Python's Unicode features.
The Unicode Type
'''''''''''''''''''
Unicode strings are expressed as instances of the ``unicode`` type,
one of Python's repertoire of built-in types. It derives from an
abstract type called ``basestring``, which is also an ancestor of the
``str`` type; you can therefore check if a value is a string type with
``isinstance(value, basestring)``. Under the hood, Python represents
Unicode strings as either 16- or 32-bit integers, depending on how the
Python interpreter was compiled.
The ``unicode()`` constructor has the signature ``unicode(string[, encoding, errors])``.
All of its arguments should be 8-bit strings. The first argument is converted
to Unicode using the specified encoding; if you leave off the ``encoding`` argument,
the ASCII encoding is used for the conversion, so characters greater than 127 will
be treated as errors::
>>> unicode('abcdef')
u'abcdef'
>>> s = unicode('abcdef')
>>> type(s)
<type 'unicode'>
>>> unicode('abcdef' + chr(255))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 6:
ordinal not in range(128)
The ``errors`` argument specifies the response when the input string can't be converted according to the encoding's rules. Legal values for this argument
are 'strict' (raise a ``UnicodeDecodeError`` exception),
'replace' (add U+FFFD, 'REPLACEMENT CHARACTER'),
or 'ignore' (just leave the character out of the Unicode result).
The following examples show the differences::
>>> unicode('\x80abc', errors='strict')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0:
ordinal not in range(128)
>>> unicode('\x80abc', errors='replace')
u'\ufffdabc'
>>> unicode('\x80abc', errors='ignore')
u'abc'
Encodings are specified as strings containing the encoding's name.
Python 2.4 comes with roughly 100 different encodings; see the Python
Library Reference at
<http://docs.python.org/lib/standard-encodings.html> for a list. Some
encodings have multiple names; for example, 'latin-1', 'iso_8859_1'
and '8859' are all synonyms for the same encoding.
One-character Unicode strings can also be created with the
``unichr()`` built-in function, which takes integers and returns a
Unicode string of length 1 that contains the corresponding code point.
The reverse operation is the built-in `ord()` function that takes a
one-character Unicode string and returns the code point value::
>>> unichr(40960)
u'\ua000'
>>> ord(u'\ua000')
40960
Instances of the ``unicode`` type have many of the same methods as
the 8-bit string type for operations such as searching and formatting::
>>> s = u'Was ever feather so lightly blown to and fro as this multitude?'
>>> s.count('e')
5
>>> s.find('feather')
9
>>> s.find('bird')
-1
>>> s.replace('feather', 'sand')
u'Was ever sand so lightly blown to and fro as this multitude?'
>>> s.upper()
u'WAS EVER FEATHER SO LIGHTLY BLOWN TO AND FRO AS THIS MULTITUDE?'
Note that the arguments to these methods can be Unicode strings or 8-bit strings.
8-bit strings will be converted to Unicode before carrying out the operation;
Python's default ASCII encoding will be used, so characters greater than 127 will cause an exception::
>>> s.find('Was\x9f')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0x9f in position 3: ordinal not in range(128)
>>> s.find(u'Was\x9f')
-1
Much Python code that operates on strings will therefore work with
Unicode strings without requiring any changes to the code. (Input and
output code needs more updating for Unicode; more on this later.)
Another important method is ``.encode([encoding], [errors='strict'])``,
which returns an 8-bit string version of the
Unicode string, encoded in the requested encoding. The ``errors``
parameter is the same as the parameter of the ``unicode()``
constructor, with one additional possibility; as well as 'strict',
'ignore', and 'replace', you can also pass 'xmlcharrefreplace' which
uses XML's character references. The following example shows the
different results::
>>> u = unichr(40960) + u'abcd' + unichr(1972)
>>> u.encode('utf-8')
'\xea\x80\x80abcd\xde\xb4'
>>> u.encode('ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in position 0: ordinal not in range(128)
>>> u.encode('ascii', 'ignore')
'abcd'
>>> u.encode('ascii', 'replace')
'?abcd?'
>>> u.encode('ascii', 'xmlcharrefreplace')
'&#40960;abcd&#1972;'
Python's 8-bit strings have a ``.decode([encoding], [errors])`` method
that interprets the string using the given encoding::
>>> u = unichr(40960) + u'abcd' + unichr(1972) # Assemble a string
>>> utf8_version = u.encode('utf-8') # Encode as UTF-8
>>> type(utf8_version), utf8_version
(<type 'str'>, '\xea\x80\x80abcd\xde\xb4')
>>> u2 = utf8_version.decode('utf-8') # Decode using UTF-8
>>> u == u2 # The two strings match
True
The low-level routines for registering and accessing the available
encodings are found in the ``codecs`` module. However, the encoding
and decoding functions returned by this module are usually more
low-level than is comfortable, so I'm not going to describe the
``codecs`` module here. If you need to implement a completely new
encoding, you'll need to learn about the ``codecs`` module interfaces,
but implementing encodings is a specialized task that also won't be
covered here. Consult the Python documentation to learn more about
this module.
The most commonly used part of the ``codecs`` module is the
``codecs.open()`` function which will be discussed in the section
on input and output.
Unicode Literals in Python Source Code
''''''''''''''''''''''''''''''''''''''''''
In Python source code, Unicode literals are written as strings
prefixed with the 'u' or 'U' character: ``u'abcdefghijk'``. Specific
code points can be written using the ``\u`` escape sequence, which is
followed by four hex digits giving the code point. The ``\U`` escape
sequence is similar, but expects 8 hex digits, not 4.
Unicode literals can also use the same escape sequences as 8-bit
strings, including ``\x``, but ``\x`` only takes two hex digits so it
can't express an arbitrary code point. Octal escapes can go up to
U+01ff, which is octal 777.
::
>>> s = u"a\xac\u1234\u20ac\U00008000"
^^^^ two-digit hex escape
^^^^^^ four-digit Unicode escape
^^^^^^^^^^ eight-digit Unicode escape
>>> for c in s: print ord(c),
...
97 172 4660 8364 32768
Using escape sequences for code points greater than 127 is fine in
small doses, but becomes an annoyance if you're using many accented
characters, as you would in a program with messages in French or some
other accent-using language. You can also assemble strings using the
``unichr()`` built-in function, but this is even more tedious.
Ideally, you'd want to be able to write literals in your language's
natural encoding. You could then edit Python source code with your
favorite editor which would display the accented characters naturally,
and have the right characters used at runtime.
Python supports writing Unicode literals in any encoding, but you have
to declare the encoding being used. This is done by including a
special comment as either the first or second line of the source
file::
#!/usr/bin/env python
# -*- coding: latin-1 -*-
u = u'abcdé'
print ord(u[-1])
The syntax is inspired by Emacs's notation for specifying variables local to a file.
Emacs supports many different variables, but Python only supports 'coding'.
The ``-*-`` symbols indicate that the comment is special; within them,
you must supply the name ``coding`` and the name of your chosen encoding,
separated by ``':'``.
If you don't include such a comment, the default encoding used will be
ASCII. Versions of Python before 2.4 were Euro-centric and assumed
Latin-1 as a default encoding for string literals; in Python 2.4,
characters greater than 127 still work but result in a warning. For
example, the following program has no encoding declaration::
#!/usr/bin/env python
u = u'abcdé'
print ord(u[-1])
When you run it with Python 2.4, it will output the following warning::
amk:~$ python p263.py
sys:1: DeprecationWarning: Non-ASCII character '\xe9'
in file p263.py on line 2, but no encoding declared;
see http://www.python.org/peps/pep-0263.html for details
Unicode Properties
'''''''''''''''''''
The Unicode specification includes a database of information about
code points. For each code point that's defined, the information
includes the character's name, its category, the numeric value if
applicable (Unicode has characters representing the Roman numerals and
fractions such as one-third and four-fifths). There are also
properties related to the code point's use in bidirectional text and
other display-related properties.
The following program displays some information about several
characters, and prints the numeric value of one particular character::
import unicodedata
u = unichr(233) + unichr(0x0bf2) + unichr(3972) + unichr(6000) + unichr(13231)
for i, c in enumerate(u):
print i, '%04x' % ord(c), unicodedata.category(c),
print unicodedata.name(c)
# Get numeric value of second character
print unicodedata.numeric(u[1])
When run, this prints::
0 00e9 Ll LATIN SMALL LETTER E WITH ACUTE
1 0bf2 No TAMIL NUMBER ONE THOUSAND
2 0f84 Mn TIBETAN MARK HALANTA
3 1770 Lo TAGBANWA LETTER SA
4 33af So SQUARE RAD OVER S SQUARED
1000.0
The category codes are abbreviations describing the nature of the
character. These are grouped into categories such as "Letter",
"Number", "Punctuation", or "Symbol", which in turn are broken up into
subcategories. To take the codes from the above output, ``'Ll'``
means 'Letter, lowercase', ``'No'`` means "Number, other", ``'Mn'`` is
"Mark, nonspacing", and ``'So'`` is "Symbol, other". See
<http://www.unicode.org/Public/UNIDATA/UCD.html#General_Category_Values>
for a list of category codes.
References
''''''''''''''
The Unicode and 8-bit string types are described in the Python library
reference at <http://docs.python.org/lib/typesseq.html>.
The documentation for the ``unicodedata`` module is at
<http://docs.python.org/lib/module-unicodedata.html>.
The documentation for the ``codecs`` module is at
<http://docs.python.org/lib/module-codecs.html>.
Marc-André Lemburg gave a presentation at EuroPython 2002
titled "Python and Unicode". A PDF version of his slides
is available at <http://www.egenix.com/files/python/Unicode-EPC2002-Talk.pdf>,
and is an excellent overview of the design of Python's Unicode features.
Reading and Writing Unicode Data
----------------------------------------
Once you've written some code that works with Unicode data, the next
problem is input/output. How do you get Unicode strings into your
program, and how do you convert Unicode into a form suitable for
storage or transmission?
It's possible that you may not need to do anything depending on your
input sources and output destinations; you should check whether the
libraries used in your application support Unicode natively. XML
parsers often return Unicode data, for example. Many relational
databases also support Unicode-valued columns and can return Unicode
values from an SQL query.
Unicode data is usually converted to a particular encoding before it
gets written to disk or sent over a socket. It's possible to do all
the work yourself: open a file, read an 8-bit string from it, and
convert the string with ``unicode(str, encoding)``. However, the
manual approach is not recommended.
One problem is the multi-byte nature of encodings; one Unicode
character can be represented by several bytes. If you want to read
the file in arbitrary-sized chunks (say, 1K or 4K), you need to write
error-handling code to catch the case where only part of the bytes
encoding a single Unicode character are read at the end of a chunk.
One solution would be to read the entire file into memory and then
perform the decoding, but that prevents you from working with files
that are extremely large; if you need to read a 2Gb file, you need 2Gb
of RAM. (More, really, since for at least a moment you'd need to have
both the encoded string and its Unicode version in memory.)
The solution would be to use the low-level decoding interface to catch
the case of partial coding sequences. The work of implementing this
has already been done for you: the ``codecs`` module includes a
version of the ``open()`` function that returns a file-like object
that assumes the file's contents are in a specified encoding and
accepts Unicode parameters for methods such as ``.read()`` and
``.write()``.
The function's parameters are
``open(filename, mode='rb', encoding=None, errors='strict', buffering=1)``. ``mode`` can be
``'r'``, ``'w'``, or ``'a'``, just like the corresponding parameter to the
regular built-in ``open()`` function; add a ``'+'`` to
update the file. ``buffering`` is similarly
parallel to the standard function's parameter.
``encoding`` is a string giving
the encoding to use; if it's left as ``None``, a regular Python file
object that accepts 8-bit strings is returned. Otherwise, a wrapper
object is returned, and data written to or read from the wrapper
object will be converted as needed. ``errors`` specifies the action
for encoding errors and can be one of the usual values of 'strict',
'ignore', and 'replace'.
Reading Unicode from a file is therefore simple::
import codecs
f = codecs.open('unicode.rst', encoding='utf-8')
for line in f:
print repr(line)
It's also possible to open files in update mode,
allowing both reading and writing::
f = codecs.open('test', encoding='utf-8', mode='w+')
f.write(u'\u4500 blah blah blah\n')
f.seek(0)
print repr(f.readline()[:1])
f.close()
Unicode character U+FEFF is used as a byte-order mark (BOM),
and is often written as the first character of a file in order
to assist with autodetection of the file's byte ordering.
Some encodings, such as UTF-16, expect a BOM to be present at
the start of a file; when such an encoding is used,
the BOM will be automatically written as the first character
and will be silently dropped when the file is read. There are
variants of these encodings, such as 'utf-16-le' and 'utf-16-be'
for little-endian and big-endian encodings, that specify
one particular byte ordering and don't
skip the BOM.
Unicode filenames
'''''''''''''''''''''''''
Most of the operating systems in common use today support filenames
that contain arbitrary Unicode characters. Usually this is
implemented by converting the Unicode string into some encoding that
varies depending on the system. For example, MacOS X uses UTF-8 while
Windows uses a configurable encoding; on Windows, Python uses the name
"mbcs" to refer to whatever the currently configured encoding is. On
Unix systems, there will only be a filesystem encoding if you've set
the ``LANG`` or ``LC_CTYPE`` environment variables; if you haven't,
the default encoding is ASCII.
The ``sys.getfilesystemencoding()`` function returns the encoding to
use on your current system, in case you want to do the encoding
manually, but there's not much reason to bother. When opening a file
for reading or writing, you can usually just provide the Unicode
string as the filename, and it will be automatically converted to the
right encoding for you::
filename = u'filename\u4500abc'
f = open(filename, 'w')
f.write('blah\n')
f.close()
Functions in the ``os`` module such as ``os.stat()`` will also accept
Unicode filenames.
``os.listdir()``, which returns filenames, raises an issue: should it
return the Unicode version of filenames, or should it return 8-bit
strings containing the encoded versions? ``os.listdir()`` will do
both, depending on whether you provided the directory path as an 8-bit
string or a Unicode string. If you pass a Unicode string as the path,
filenames will be decoded using the filesystem's encoding and a list
of Unicode strings will be returned, while passing an 8-bit path will
return the 8-bit versions of the filenames. For example, assuming the
default filesystem encoding is UTF-8, running the following program::
fn = u'filename\u4500abc'
f = open(fn, 'w')
f.close()
import os
print os.listdir('.')
print os.listdir(u'.')
will produce the following output::
amk:~$ python t.py
['.svn', 'filename\xe4\x94\x80abc', ...]
[u'.svn', u'filename\u4500abc', ...]
The first list contains UTF-8-encoded filenames, and the second list
contains the Unicode versions.
Tips for Writing Unicode-aware Programs
''''''''''''''''''''''''''''''''''''''''''''
This section provides some suggestions on writing software that
deals with Unicode.
The most important tip is:
Software should only work with Unicode strings internally,
converting to a particular encoding on output.
If you attempt to write processing functions that accept both
Unicode and 8-bit strings, you will find your program vulnerable to
bugs wherever you combine the two different kinds of strings. Python's
default encoding is ASCII, so whenever a character with an ASCII value >127
is in the input data, you'll get a ``UnicodeDecodeError``
because that character can't be handled by the ASCII encoding.
It's easy to miss such problems if you only test your software
with data that doesn't contain any
accents; everything will seem to work, but there's actually a bug in your
program waiting for the first user who attempts to use characters >127.
A second tip, therefore, is:
Include characters >127 and, even better, characters >255 in your
test data.
When using data coming from a web browser or some other untrusted source,
a common technique is to check for illegal characters in a string
before using the string in a generated command line or storing it in a
database. If you're doing this, be careful to check
the string once it's in the form that will be used or stored; it's
possible for encodings to be used to disguise characters. This is especially
true if the input data also specifies the encoding;
many encodings leave the commonly checked-for characters alone,
but Python includes some encodings such as ``'base64'``
that modify every single character.
For example, let's say you have a content management system that takes a
Unicode filename, and you want to disallow paths with a '/' character.
You might write this code::
def read_file (filename, encoding):
if '/' in filename:
raise ValueError("'/' not allowed in filenames")
unicode_name = filename.decode(encoding)
f = open(unicode_name, 'r')
# ... return contents of file ...
However, if an attacker could specify the ``'base64'`` encoding,
they could pass ``'L2V0Yy9wYXNzd2Q='``, which is the base-64
encoded form of the string ``'/etc/passwd'``, to read a
system file. The above code looks for ``'/'`` characters
in the encoded form and misses the dangerous character
in the resulting decoded form.
References
''''''''''''''
The PDF slides for Marc-André Lemburg's presentation "Writing
Unicode-aware Applications in Python" are available at
<http://www.egenix.com/files/python/LSM2005-Developing-Unicode-aware-applications-in-Python.pdf>
and discuss questions of character encodings as well as how to
internationalize and localize an application.
Revision History and Acknowledgements
------------------------------------------
Thanks to the following people who have noted errors or offered
suggestions on this article: Nicholas Bastin,
Marius Gedminas, Kent Johnson, Ken Krugler,
Marc-André Lemburg, Martin von Löwis, Chad Whitacre.
Version 1.0: posted August 5 2005.
Version 1.01: posted August 7 2005. Corrects factual and markup
errors; adds several links.
Version 1.02: posted August 16 2005. Corrects factual errors.
.. comment Additional topic: building Python w/ UCS2 or UCS4 support
.. comment Describe obscure -U switch somewhere?
.. comment Describe use of codecs.StreamRecoder and StreamReaderWriter
.. comment
Original outline:
- [ ] Unicode introduction
- [ ] ASCII
- [ ] Terms
- [ ] Character
- [ ] Code point
- [ ] Encodings
- [ ] Common encodings: ASCII, Latin-1, UTF-8
- [ ] Unicode Python type
- [ ] Writing unicode literals
- [ ] Obscurity: -U switch
- [ ] Built-ins
- [ ] unichr()
- [ ] ord()
- [ ] unicode() constructor
- [ ] Unicode type
- [ ] encode(), decode() methods
- [ ] Unicodedata module for character properties
- [ ] I/O
- [ ] Reading/writing Unicode data into files
- [ ] Byte-order marks
- [ ] Unicode filenames
- [ ] Writing Unicode programs
- [ ] Do everything in Unicode
- [ ] Declaring source code encodings (PEP 263)
- [ ] Other issues
- [ ] Building Python (UCS2, UCS4)

View File

@ -1,603 +0,0 @@
==============================================
HOWTO Fetch Internet Resources Using urllib2
==============================================
----------------------------
Fetching URLs With Python
----------------------------
.. note::
There is an French translation of an earlier revision of this
HOWTO, available at `urllib2 - Le Manuel manquant
<http://www.voidspace/python/articles/urllib2_francais.shtml>`_.
.. contents:: urllib2 Tutorial
Introduction
============
.. sidebar:: Related Articles
You may also find useful the following article on fetching web
resources with Python :
* `Basic Authentication <http://www.voidspace.org.uk/python/articles/authentication.shtml>`_
A tutorial on *Basic Authentication*, with examples in Python.
This HOWTO is written by `Michael Foord
<http://www.voidspace.org.uk/python/index.shtml>`_.
**urllib2** is a `Python <http://www.python.org>`_ module for fetching URLs
(Uniform Resource Locators). It offers a very simple interface, in the form of
the *urlopen* function. This is capable of fetching URLs using a variety
of different protocols. It also offers a slightly more complex
interface for handling common situations - like basic authentication,
cookies, proxies and so on. These are provided by objects called
handlers and openers.
urllib2 supports fetching URLs for many "URL schemes" (identified by the string
before the ":" in URL - for example "ftp" is the URL scheme of
"ftp://python.org/") using their associated network protocols (e.g. FTP, HTTP).
This tutorial focuses on the most common case, HTTP.
For straightforward situations *urlopen* is very easy to use. But as
soon as you encounter errors or non-trivial cases when opening HTTP
URLs, you will need some understanding of the HyperText Transfer
Protocol. The most comprehensive and authoritative reference to HTTP
is :RFC:`2616`. This is a technical document and not intended to be
easy to read. This HOWTO aims to illustrate using *urllib2*, with
enough detail about HTTP to help you through. It is not intended to
replace the `urllib2 docs <http://docs.python.org/lib/module-urllib2.html>`_ ,
but is supplementary to them.
Fetching URLs
=============
The simplest way to use urllib2 is as follows : ::
import urllib2
response = urllib2.urlopen('http://python.org/')
html = response.read()
Many uses of urllib2 will be that simple (note that instead of an
'http:' URL we could have used an URL starting with 'ftp:', 'file:',
etc.). However, it's the purpose of this tutorial to explain the more
complicated cases, concentrating on HTTP.
HTTP is based on requests and responses - the client makes requests
and servers send responses. urllib2 mirrors this with a ``Request``
object which represents the HTTP request you are making. In its
simplest form you create a Request object that specifies the URL you
want to fetch. Calling ``urlopen`` with this Request object returns a
response object for the URL requested. This response is a file-like
object, which means you can for example call .read() on the response :
::
import urllib2
req = urllib2.Request('http://www.voidspace.org.uk')
response = urllib2.urlopen(req)
the_page = response.read()
Note that urllib2 makes use of the same Request interface to handle
all URL schemes. For example, you can make an FTP request like so: ::
req = urllib2.Request('ftp://example.com/')
In the case of HTTP, there are two extra things that Request objects
allow you to do: First, you can pass data to be sent to the server.
Second, you can pass extra information ("metadata") *about* the data
or the about request itself, to the server - this information is sent
as HTTP "headers". Let's look at each of these in turn.
Data
----
Sometimes you want to send data to a URL (often the URL will refer to
a CGI (Common Gateway Interface) script [#]_ or other web
application). With HTTP, this is often done using what's known as a
**POST** request. This is often what your browser does when you submit
a HTML form that you filled in on the web. Not all POSTs have to come
from forms: you can use a POST to transmit arbitrary data to your own
application. In the common case of HTML forms, the data needs to be
encoded in a standard way, and then passed to the Request object as
the ``data`` argument. The encoding is done using a function from the
``urllib`` library *not* from ``urllib2``. ::
import urllib
import urllib2
url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
'location' : 'Northampton',
'language' : 'Python' }
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
Note that other encodings are sometimes required (e.g. for file upload
from HTML forms - see
`HTML Specification, Form Submission <http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13>`_
for more details).
If you do not pass the ``data`` argument, urllib2 uses a **GET**
request. One way in which GET and POST requests differ is that POST
requests often have "side-effects": they change the state of the
system in some way (for example by placing an order with the website
for a hundredweight of tinned spam to be delivered to your door).
Though the HTTP standard makes it clear that POSTs are intended to
*always* cause side-effects, and GET requests *never* to cause
side-effects, nothing prevents a GET request from having side-effects,
nor a POST requests from having no side-effects. Data can also be
passed in an HTTP GET request by encoding it in the URL itself.
This is done as follows::
>>> import urllib2
>>> import urllib
>>> data = {}
>>> data['name'] = 'Somebody Here'
>>> data['location'] = 'Northampton'
>>> data['language'] = 'Python'
>>> url_values = urllib.urlencode(data)
>>> print url_values
name=Somebody+Here&language=Python&location=Northampton
>>> url = 'http://www.example.com/example.cgi'
>>> full_url = url + '?' + url_values
>>> data = urllib2.open(full_url)
Notice that the full URL is created by adding a ``?`` to the URL, followed by
the encoded values.
Headers
-------
We'll discuss here one particular HTTP header, to illustrate how to
add headers to your HTTP request.
Some websites [#]_ dislike being browsed by programs, or send
different versions to different browsers [#]_ . By default urllib2
identifies itself as ``Python-urllib/x.y`` (where ``x`` and ``y`` are
the major and minor version numbers of the Python release,
e.g. ``Python-urllib/2.5``), which may confuse the site, or just plain
not work. The way a browser identifies itself is through the
``User-Agent`` header [#]_. When you create a Request object you can
pass a dictionary of headers in. The following example makes the same
request as above, but identifies itself as a version of Internet
Explorer [#]_. ::
import urllib
import urllib2
url = 'http://www.someserver.com/cgi-bin/register.cgi'
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
values = {'name' : 'Michael Foord',
'location' : 'Northampton',
'language' : 'Python' }
headers = { 'User-Agent' : user_agent }
data = urllib.urlencode(values)
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
the_page = response.read()
The response also has two useful methods. See the section on `info and
geturl`_ which comes after we have a look at what happens when things
go wrong.
Handling Exceptions
===================
*urlopen* raises ``URLError`` when it cannot handle a response (though
as usual with Python APIs, builtin exceptions such as ValueError,
TypeError etc. may also be raised).
``HTTPError`` is the subclass of ``URLError`` raised in the specific
case of HTTP URLs.
URLError
--------
Often, URLError is raised because there is no network connection (no
route to the specified server), or the specified server doesn't exist.
In this case, the exception raised will have a 'reason' attribute,
which is a tuple containing an error code and a text error message.
e.g. ::
>>> req = urllib2.Request('http://www.pretend_server.org')
>>> try: urllib2.urlopen(req)
>>> except URLError, e:
>>> print e.reason
>>>
(4, 'getaddrinfo failed')
HTTPError
---------
Every HTTP response from the server contains a numeric "status
code". Sometimes the status code indicates that the server is unable
to fulfil the request. The default handlers will handle some of these
responses for you (for example, if the response is a "redirection"
that requests the client fetch the document from a different URL,
urllib2 will handle that for you). For those it can't handle, urlopen
will raise an ``HTTPError``. Typical errors include '404' (page not
found), '403' (request forbidden), and '401' (authentication
required).
See section 10 of RFC 2616 for a reference on all the HTTP error
codes.
The ``HTTPError`` instance raised will have an integer 'code'
attribute, which corresponds to the error sent by the server.
Error Codes
~~~~~~~~~~~
Because the default handlers handle redirects (codes in the 300
range), and codes in the 100-299 range indicate success, you will
usually only see error codes in the 400-599 range.
``BaseHTTPServer.BaseHTTPRequestHandler.responses`` is a useful
dictionary of response codes in that shows all the response codes used
by RFC 2616. The dictionary is reproduced here for convenience ::
# Table mapping response codes to messages; entries have the
# form {code: (shortmessage, longmessage)}.
responses = {
100: ('Continue', 'Request received, please continue'),
101: ('Switching Protocols',
'Switching to new protocol; obey Upgrade header'),
200: ('OK', 'Request fulfilled, document follows'),
201: ('Created', 'Document created, URL follows'),
202: ('Accepted',
'Request accepted, processing continues off-line'),
203: ('Non-Authoritative Information', 'Request fulfilled from cache'),
204: ('No Content', 'Request fulfilled, nothing follows'),
205: ('Reset Content', 'Clear input form for further input.'),
206: ('Partial Content', 'Partial content follows.'),
300: ('Multiple Choices',
'Object has several resources -- see URI list'),
301: ('Moved Permanently', 'Object moved permanently -- see URI list'),
302: ('Found', 'Object moved temporarily -- see URI list'),
303: ('See Other', 'Object moved -- see Method and URL list'),
304: ('Not Modified',
'Document has not changed since given time'),
305: ('Use Proxy',
'You must use proxy specified in Location to access this '
'resource.'),
307: ('Temporary Redirect',
'Object moved temporarily -- see URI list'),
400: ('Bad Request',
'Bad request syntax or unsupported method'),
401: ('Unauthorized',
'No permission -- see authorization schemes'),
402: ('Payment Required',
'No payment -- see charging schemes'),
403: ('Forbidden',
'Request forbidden -- authorization will not help'),
404: ('Not Found', 'Nothing matches the given URI'),
405: ('Method Not Allowed',
'Specified method is invalid for this server.'),
406: ('Not Acceptable', 'URI not available in preferred format.'),
407: ('Proxy Authentication Required', 'You must authenticate with '
'this proxy before proceeding.'),
408: ('Request Timeout', 'Request timed out; try again later.'),
409: ('Conflict', 'Request conflict.'),
410: ('Gone',
'URI no longer exists and has been permanently removed.'),
411: ('Length Required', 'Client must specify Content-Length.'),
412: ('Precondition Failed', 'Precondition in headers is false.'),
413: ('Request Entity Too Large', 'Entity is too large.'),
414: ('Request-URI Too Long', 'URI is too long.'),
415: ('Unsupported Media Type', 'Entity body in unsupported format.'),
416: ('Requested Range Not Satisfiable',
'Cannot satisfy request range.'),
417: ('Expectation Failed',
'Expect condition could not be satisfied.'),
500: ('Internal Server Error', 'Server got itself in trouble'),
501: ('Not Implemented',
'Server does not support this operation'),
502: ('Bad Gateway', 'Invalid responses from another server/proxy.'),
503: ('Service Unavailable',
'The server cannot process the request due to a high load'),
504: ('Gateway Timeout',
'The gateway server did not receive a timely response'),
505: ('HTTP Version Not Supported', 'Cannot fulfill request.'),
}
When an error is raised the server responds by returning an HTTP error
code *and* an error page. You can use the ``HTTPError`` instance as a
response on the page returned. This means that as well as the code
attribute, it also has read, geturl, and info, methods. ::
>>> req = urllib2.Request('http://www.python.org/fish.html')
>>> try:
>>> urllib2.urlopen(req)
>>> except URLError, e:
>>> print e.code
>>> print e.read()
>>>
404
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<?xml-stylesheet href="./css/ht2html.css"
type="text/css"?>
<html><head><title>Error 404: File Not Found</title>
...... etc...
Wrapping it Up
--------------
So if you want to be prepared for ``HTTPError`` *or* ``URLError``
there are two basic approaches. I prefer the second approach.
Number 1
~~~~~~~~
::
from urllib2 import Request, urlopen, URLError, HTTPError
req = Request(someurl)
try:
response = urlopen(req)
except HTTPError, e:
print 'The server couldn\'t fulfill the request.'
print 'Error code: ', e.code
except URLError, e:
print 'We failed to reach a server.'
print 'Reason: ', e.reason
else:
# everything is fine
.. note::
The ``except HTTPError`` *must* come first, otherwise ``except URLError``
will *also* catch an ``HTTPError``.
Number 2
~~~~~~~~
::
from urllib2 import Request, urlopen, URLError
req = Request(someurl)
try:
response = urlopen(req)
except URLError, e:
if hasattr(e, 'reason'):
print 'We failed to reach a server.'
print 'Reason: ', e.reason
elif hasattr(e, 'code'):
print 'The server couldn\'t fulfill the request.'
print 'Error code: ', e.code
else:
# everything is fine
info and geturl
===============
The response returned by urlopen (or the ``HTTPError`` instance) has
two useful methods ``info`` and ``geturl``.
**geturl** - this returns the real URL of the page fetched. This is
useful because ``urlopen`` (or the opener object used) may have
followed a redirect. The URL of the page fetched may not be the same
as the URL requested.
**info** - this returns a dictionary-like object that describes the
page fetched, particularly the headers sent by the server. It is
currently an ``httplib.HTTPMessage`` instance.
Typical headers include 'Content-length', 'Content-type', and so
on. See the
`Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_
for a useful listing of HTTP headers with brief explanations of their meaning
and use.
Openers and Handlers
====================
When you fetch a URL you use an opener (an instance of the perhaps
confusingly-named ``urllib2.OpenerDirector``). Normally we have been using
the default opener - via ``urlopen`` - but you can create custom
openers. Openers use handlers. All the "heavy lifting" is done by the
handlers. Each handler knows how to open URLs for a particular URL
scheme (http, ftp, etc.), or how to handle an aspect of URL opening,
for example HTTP redirections or HTTP cookies.
You will want to create openers if you want to fetch URLs with
specific handlers installed, for example to get an opener that handles
cookies, or to get an opener that does not handle redirections.
To create an opener, instantiate an OpenerDirector, and then call
.add_handler(some_handler_instance) repeatedly.
Alternatively, you can use ``build_opener``, which is a convenience
function for creating opener objects with a single function call.
``build_opener`` adds several handlers by default, but provides a
quick way to add more and/or override the default handlers.
Other sorts of handlers you might want to can handle proxies,
authentication, and other common but slightly specialised
situations.
``install_opener`` can be used to make an ``opener`` object the
(global) default opener. This means that calls to ``urlopen`` will use
the opener you have installed.
Opener objects have an ``open`` method, which can be called directly
to fetch urls in the same way as the ``urlopen`` function: there's no
need to call ``install_opener``, except as a convenience.
Basic Authentication
====================
To illustrate creating and installing a handler we will use the
``HTTPBasicAuthHandler``. For a more detailed discussion of this
subject - including an explanation of how Basic Authentication works -
see the `Basic Authentication Tutorial <http://www.voidspace.org.uk/python/articles/authentication.shtml>`_.
When authentication is required, the server sends a header (as well as
the 401 error code) requesting authentication. This specifies the
authentication scheme and a 'realm'. The header looks like :
``Www-authenticate: SCHEME realm="REALM"``.
e.g. ::
Www-authenticate: Basic realm="cPanel Users"
The client should then retry the request with the appropriate name and
password for the realm included as a header in the request. This is
'basic authentication'. In order to simplify this process we can
create an instance of ``HTTPBasicAuthHandler`` and an opener to use
this handler.
The ``HTTPBasicAuthHandler`` uses an object called a password manager
to handle the mapping of URLs and realms to passwords and
usernames. If you know what the realm is (from the authentication
header sent by the server), then you can use a
``HTTPPasswordMgr``. Frequently one doesn't care what the realm is. In
that case, it is convenient to use
``HTTPPasswordMgrWithDefaultRealm``. This allows you to specify a
default username and password for a URL. This will be supplied in the
absence of you providing an alternative combination for a specific
realm. We indicate this by providing ``None`` as the realm argument to
the ``add_password`` method.
The top-level URL is the first URL that requires authentication. URLs
"deeper" than the URL you pass to .add_password() will also match. ::
# create a password manager
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
# Add the username and password.
# If we knew the realm, we could use it instead of ``None``.
top_level_url = "http://example.com/foo/"
password_mgr.add_password(None, top_level_url, username, password)
handler = urllib2.HTTPBasicAuthHandler(password_mgr)
# create "opener" (OpenerDirector instance)
opener = urllib2.build_opener(handler)
# use the opener to fetch a URL
opener.open(a_url)
# Install the opener.
# Now all calls to urllib2.urlopen use our opener.
urllib2.install_opener(opener)
.. note::
In the above example we only supplied our ``HHTPBasicAuthHandler``
to ``build_opener``. By default openers have the handlers for
normal situations - ``ProxyHandler``, ``UnknownHandler``,
``HTTPHandler``, ``HTTPDefaultErrorHandler``,
``HTTPRedirectHandler``, ``FTPHandler``, ``FileHandler``,
``HTTPErrorProcessor``.
top_level_url is in fact *either* a full URL (including the 'http:'
scheme component and the hostname and optionally the port number)
e.g. "http://example.com/" *or* an "authority" (i.e. the hostname,
optionally including the port number) e.g. "example.com" or
"example.com:8080" (the latter example includes a port number). The
authority, if present, must NOT contain the "userinfo" component - for
example "joe@password:example.com" is not correct.
Proxies
=======
**urllib2** will auto-detect your proxy settings and use those. This
is through the ``ProxyHandler`` which is part of the normal handler
chain. Normally that's a good thing, but there are occasions when it
may not be helpful [#]_. One way to do this is to setup our own
``ProxyHandler``, with no proxies defined. This is done using similar
steps to setting up a `Basic Authentication`_ handler : ::
>>> proxy_support = urllib2.ProxyHandler({})
>>> opener = urllib2.build_opener(proxy_support)
>>> urllib2.install_opener(opener)
.. note::
Currently ``urllib2`` *does not* support fetching of ``https``
locations through a proxy. However, this can be enabled by extending
urllib2 as shown in the recipe [#]_.
Sockets and Layers
==================
The Python support for fetching resources from the web is
layered. urllib2 uses the httplib library, which in turn uses the
socket library.
As of Python 2.3 you can specify how long a socket should wait for a
response before timing out. This can be useful in applications which
have to fetch web pages. By default the socket module has *no timeout*
and can hang. Currently, the socket timeout is not exposed at the
httplib or urllib2 levels. However, you can set the default timeout
globally for all sockets using : ::
import socket
import urllib2
# timeout in seconds
timeout = 10
socket.setdefaulttimeout(timeout)
# this call to urllib2.urlopen now uses the default timeout
# we have set in the socket module
req = urllib2.Request('http://www.voidspace.org.uk')
response = urllib2.urlopen(req)
-------
Footnotes
=========
This document was reviewed and revised by John Lee.
.. [#] For an introduction to the CGI protocol see
`Writing Web Applications in Python <http://www.pyzine.com/Issue008/Section_Articles/article_CGIOne.html>`_.
.. [#] Like Google for example. The *proper* way to use google from a program
is to use `PyGoogle <http://pygoogle.sourceforge.net>`_ of course. See
`Voidspace Google <http://www.voidspace.org.uk/python/recipebook.shtml#google>`_
for some examples of using the Google API.
.. [#] Browser sniffing is a very bad practise for website design - building
sites using web standards is much more sensible. Unfortunately a lot of
sites still send different versions to different browsers.
.. [#] The user agent for MSIE 6 is
*'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)'*
.. [#] For details of more HTTP request headers, see
`Quick Reference to HTTP Headers`_.
.. [#] In my case I have to use a proxy to access the internet at work. If you
attempt to fetch *localhost* URLs through this proxy it blocks them. IE
is set to use the proxy, which urllib2 picks up on. In order to test
scripts with a localhost server, I have to prevent urllib2 from using
the proxy.
.. [#] urllib2 opener for SSL proxy (CONNECT method): `ASPN Cookbook Recipe
<http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/456195>`_.

View File

@ -1,24 +0,0 @@
<p> This document was generated using the <a
href="http://saftsack.fs.uni-bayreuth.de/;SPMtilde;latex2ht/">
<strong>LaTeX</strong>2<tt>HTML</tt></a> translator.
</p>
<p> <a
href="http://saftsack.fs.uni-bayreuth.de/;SPMtilde;latex2ht/">
<strong>LaTeX</strong>2<tt>HTML</tt></a> is Copyright &copy;
1993, 1994, 1995, 1996, 1997, <a
href="http://cbl.leeds.ac.uk/nikos/personal.html">Nikos
Drakos</a>, Computer Based Learning Unit, University of
Leeds, and Copyright &copy; 1997, 1998, <a
href="http://www.maths.mq.edu.au/;SPMtilde;ross/">Ross
Moore</a>, Mathematics Department, Macquarie University,
Sydney.
</p>
<p> The application of <a
href="http://saftsack.fs.uni-bayreuth.de/;SPMtilde;latex2ht/">
<strong>LaTeX</strong>2<tt>HTML</tt></a> to the Python
documentation has been heavily tailored by Fred L. Drake,
Jr. Original navigation icons were contributed by Christopher
Petrilli.
</p>

View File

@ -1,84 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>About the Python Documentation</title>
<meta name="description"
content="Overview information about the Python documentation">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="contents" href="index.html" title="Python Documentation Index">
<link rel="index" href="modindex.html" title="Global Module Index">
<link rel="start" href="index.html" title="Python Documentation Index">
<link rel="up" href="index.html" title="Python Documentation Index">
<link rel="SHORTCUT ICON" href="icons/pyfav.png" type="image/png">
<link rel="STYLESHEET" href="lib/lib.css">
</head>
<body>
<div class="navigation">
<table width="100%" cellpadding="0" cellspacing="2">
<tr>
<td><img width="32" height="32" align="bottom" border="0" alt=""
src="icons/blank.png"></td>
<td><a href="index.html"
title="Python Documentation Index"><img width="32" height="32"
align="bottom" border="0" alt="up"
src="icons/up.png"></a></td>
<td><img width="32" height="32" align="bottom" border="0" alt=""
src="icons/blank.png"></td>
<td align="center" width="100%">About the Python Documentation</td>
<td><img width="32" height="32" align="bottom" border="0" alt=""
src="icons/blank.png"></td>
<td><img width="32" height="32" align="bottom" border="0" alt=""
src="icons/blank.png"></td>
<td><img width="32" height="32" align="bottom" border="0" alt=""
src="icons/blank.png"></td>
</tr>
</table>
<b class="navlabel">Up:</b>
<span class="sectref">
<a href="index.html" title="Python Documentation Index">
Python Documentation Index</A></span>
<br>
</div>
<hr>
<h2>About the Python Documentation</h2>
<p>The Python documentation was originally written by Guido van
Rossum, but has increasingly become a community effort over the
past several years. This growing collection of documents is
available in several formats, including typeset versions in PDF
and PostScript for printing, from the <a
href="http://www.python.org/">Python Web site</a>.
<p>A <a href="acks.html">list of contributors</a> is available.
<h2>Comments and Questions</h2>
<p> General comments and questions regarding this document should
be sent by email to <a href="mailto:docs@python.org"
>docs@python.org</a>. If you find specific errors in
this document, please report the bug at the <a
href="http://sourceforge.net/bugs/?group_id=5470">Python Bug
Tracker</a> at <a href="http://sourceforge.net/">SourceForge</a>.
If you are able to provide suggested text, either to replace
existing incorrect or unclear material, or additional text to
supplement what's already available, we'd appreciate the
contribution. There's no need to worry about text markup; our
documentation team will gladly take care of that.
</p>
<p> Questions regarding how to use the information in this
document should be sent to the Python news group, <a
href="news:comp.lang.python">comp.lang.python</a>, or the <a
href="http://www.python.org/mailman/listinfo/python-list"
>Python mailing list</a> (which is gated to the newsgroup and
carries the same content).
</p>
<p> For any of these channels, please be sure not to send HTML email.
Thanks.
</p>
<hr>
</body>
</html>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.9 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.0 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 438 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 649 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 289 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 529 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 385 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 598 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 253 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 511 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 252 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 511 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 125 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 240 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 316 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 577 B

View File

@ -1,140 +0,0 @@
<html>
<head>
<title>Python @RELEASE@ Documentation - @DATE@</title>
<meta name="aesop" content="links">
<meta name="description"
content="Top-level index to the standard documentation for
Python @RELEASE@.">
<link rel="SHORTCUT ICON" href="icons/pyfav.png" type="image/png">
<link rel="STYLESHEET" href="lib/lib.css" type="text/css">
<link rel="author" href="acks.html" title="Acknowledgements">
<link rel="help" href="about.html" title="About the Python Documentation">
<link rel="index" href="modindex.html" title="Global Module Index">
<style type="text/css">
a.title { font-weight: bold; font-size: 110%; }
ul { margin-left: 1em; padding: 0pt; border: 0pt; }
ul li { margin-top: 0.2em; }
td.left-column { padding-right: 1em; }
td.right-column { padding-left: 1em; }
</style>
</head>
<body>
<div class="navigation">
<table align="center" width="100%" cellpadding="0" cellspacing="2">
<tr>
<td><img width="32" height="32" align="bottom" border="0" alt=""
src="icons/blank.png"></td>
<td><img width="32" height="32" align="bottom" border="0" alt=""
src="icons/blank.png"></td>
<td><img width="32" height="32" align="bottom" border="0" alt=""
src="icons/blank.png"></td>
<td align="center" width="100%">
<b class="title">Python Documentation</b></td>
<td><img width="32" height="32" align="bottom" border="0" alt=""
src="icons/blank.png"></td>
<td><a href="modindex.html"><img width="32" height="32"
align="bottom" border="0" alt="Module Index"
src="icons/modules.png"></a></td>
<td><img width="32" height="32" align="bottom" border="0" alt=""
src="icons/blank.png"></A></td>
</tr>
</table>
<hr>
</div>
<div align="center" class="titlepage">
<h1>Python Documentation</h1>
<p>
<strong>Release @RELEASE@</strong>
<br>
<strong>@DATE@</strong>
</p>
</div>
<table align="center">
<tbody>
<tr>
<td class="left-column">
<ul>
<li> <a href="tut/tut.html" class="title">Tutorial</a>
<br>(start here)
</ul>
</td>
<td class="right-column">
<ul>
<li> <a href="whatsnew/@WHATSNEW@.html" class="title"
>What's New in Python</a>
<br>(changes since the last major release)
</ul>
</td>
</tr>
<tr>
<td valign="baseline" class="left-column">
&nbsp;
<ul>
<li> <a href="modindex.html" class="title">Global Module Index</a>
<br>(for quick access to all documentation)
<li> <a href="lib/lib.html" class="title">Library Reference</a>
<br>(keep this under your pillow)
<li> <a href="mac/mac.html" class="title">Macintosh Module
Reference</a>
<br>(this too, if you use a Macintosh)
<li> <a href="inst/inst.html" class="title">Installing
Python Modules</a>
<br>(for administrators)
<li> <a href="dist/dist.html" class="title">Distributing
Python Modules</a>
<br>(for developers and packagers)
</ul>
</td>
<td valign="baseline" class="right-column">
&nbsp;
<ul>
<li> <a href="ref/ref.html" class="title">Language Reference</a>
<br>(for language lawyers)
<li> <a href="ext/ext.html" class="title">Extending and
Embedding</a>
<br>(tutorial for C/C++ programmers)
<li> <a href="api/api.html" class="title">Python/C API</a>
<br>(reference for C/C++ programmers)
<li> <a href="doc/doc.html" class="title">Documenting Python</a>
<br>(information for documentation authors)
</ul>
</td>
</tr>
<tr>
<td valign="baseline" class="left-column">
&nbsp;
<ul>
<li> <a href="http://www.python.org/doc/" class="title"
>Documentation Central</a>
<br>(for everyone)
</ul>
</td>
<td valign="baseline" class="right-column">
&nbsp;
<ul>
<li> <a href="http://www.python.org/doc/howto/" class="title"
>Python How-To Guides</a>
<br>(special topics)
</ul>
</td>
</tr>
</tbody>
</table>
<p>
<address>
<hr>
See <i><a href="about.html">About the Python Documentation</a></i>
for information on suggesting changes.
</address>
</body>
</html>

View File

@ -1,54 +0,0 @@
<p> This document was generated using the <a
href="http://saftsack.fs.uni-bayreuth.de/;SPMtilde;latex2ht/">
<strong>LaTeX</strong>2<tt>HTML</tt></a> translator.
</p>
<p> <a
href="http://saftsack.fs.uni-bayreuth.de/;SPMtilde;latex2ht/">
<strong>LaTeX</strong>2<tt>HTML</tt></a> is Copyright &copy;
1993, 1994, 1995, 1996, 1997, <a
href="http://cbl.leeds.ac.uk/nikos/personal.html">Nikos
Drakos</a>, Computer Based Learning Unit, University of
Leeds, and Copyright &copy; 1997, 1998, <a
href="http://www.maths.mq.edu.au/;SPMtilde;ross/">Ross
Moore</a>, Mathematics Department, Macquarie University,
Sydney.
</p>
<p> The application of <a
href="http://saftsack.fs.uni-bayreuth.de/;SPMtilde;latex2ht/">
<strong>LaTeX</strong>2<tt>HTML</tt></a> to the Python
documentation has been heavily tailored by Fred L. Drake,
Jr. Original navigation icons were contributed by Christopher
Petrilli.
</p>
<hr>
<h2>Comments and Questions</h2>
<p> General comments and questions regarding this document should
be sent by email to <a href="mailto:docs@python.org"
>docs@python.org</a>. If you find specific errors in
this document, either in the content or the presentation, please
report the bug at the <a
href="http://sourceforge.net/bugs/?group_id=5470">Python Bug
Tracker</a> at <a href="http://sourceforge.net/">SourceForge</a>.
If you are able to provide suggested text, either to replace
existing incorrect or unclear material, or additional text to
supplement what's already available, we'd appreciate the
contribution. There's no need to worry about text markup; our
documentation team will gladly take care of that.
</p>
<p> Questions regarding how to use the information in this
document should be sent to the Python news group, <a
href="news:comp.lang.python">comp.lang.python</a>, or the <a
href="http://www.python.org/mailman/listinfo/python-list"
>Python mailing list</a> (which is gated to the newsgroup and
carries the same content).
</p>
<p> For any of these channels, please be sure not to send HTML email.
Thanks.
</p>

View File

@ -1,243 +0,0 @@
/*
* The first part of this is the standard CSS generated by LaTeX2HTML,
* with the "empty" declarations removed.
*/
/* Century Schoolbook font is very similar to Computer Modern Math: cmmi */
.math { font-family: "Century Schoolbook", serif; }
.math i { font-family: "Century Schoolbook", serif;
font-weight: bold }
.boldmath { font-family: "Century Schoolbook", serif;
font-weight: bold }
/*
* Implement both fixed-size and relative sizes.
*
* I think these can be safely removed, as it doesn't appear that
* LaTeX2HTML ever generates these, even though these are carried
* over from the LaTeX2HTML stylesheet.
*/
small.xtiny { font-size : xx-small; }
small.tiny { font-size : x-small; }
small.scriptsize { font-size : smaller; }
small.footnotesize { font-size : small; }
big.xlarge { font-size : large; }
big.xxlarge { font-size : x-large; }
big.huge { font-size : larger; }
big.xhuge { font-size : xx-large; }
/*
* Document-specific styles come next;
* these are added for the Python documentation.
*
* Note that the size specifications for the H* elements are because
* Netscape on Solaris otherwise doesn't get it right; they all end up
* the normal text size.
*/
body { color: #000000;
background-color: #ffffff; }
a:link:active { color: #ff0000; }
a:link:hover { background-color: #bbeeff; }
a:visited:hover { background-color: #bbeeff; }
a:visited { color: #551a8b; }
a:link { color: #0000bb; }
h1, h2, h3, h4, h5, h6 { font-family: avantgarde, sans-serif;
font-weight: bold; }
h1 { font-size: 180%; }
h2 { font-size: 150%; }
h3, h4 { font-size: 120%; }
/* These are section titles used in navigation links, so make sure we
* match the section header font here, even it not the weight.
*/
.sectref { font-family: avantgarde, sans-serif; }
/* And the label before the titles in navigation: */
.navlabel { font-size: 85%; }
/* LaTeX2HTML insists on inserting <br> elements into headers which
* are marked with \label. This little bit of CSS magic ensures that
* these elements don't cause spurious whitespace to be added.
*/
h1>br, h2>br, h3>br,
h4>br, h5>br, h6>br { display: none; }
code, tt { font-family: "lucida typewriter", lucidatypewriter,
monospace; }
var { font-family: times, serif;
font-style: italic;
font-weight: normal; }
.Unix { font-variant: small-caps; }
.typelabel { font-family: lucida, sans-serif; }
.navigation td { background-color: #99ccff;
font-weight: bold;
font-family: avantgarde, sans-serif;
font-size: 110%; }
div.warning { background-color: #fffaf0;
border: thin solid black;
padding: 1em;
margin-left: 2em;
margin-right: 2em; }
div.warning .label { font-family: sans-serif;
font-size: 110%;
margin-right: 0.5em; }
div.note { background-color: #fffaf0;
border: thin solid black;
padding: 1em;
margin-left: 2em;
margin-right: 2em; }
div.note .label { margin-right: 0.5em;
font-family: sans-serif; }
address { font-size: 80%; }
.release-info { font-style: italic;
font-size: 80%; }
.titlegraphic { vertical-align: top; }
.verbatim pre { color: #00008b;
font-family: "lucida typewriter", lucidatypewriter,
monospace;
font-size: 90%; }
.verbatim { margin-left: 2em; }
.verbatim .footer { padding: 0.05in;
font-size: 85%;
background-color: #99ccff;
margin-right: 0.5in; }
.grammar { background-color: #99ccff;
margin-right: 0.5in;
padding: 0.05in; }
.grammar-footer { padding: 0.05in;
font-size: 85%; }
.grammartoken { font-family: "lucida typewriter", lucidatypewriter,
monospace; }
.productions { background-color: #bbeeff; }
.productions a:active { color: #ff0000; }
.productions a:link:hover { background-color: #99ccff; }
.productions a:visited:hover { background-color: #99ccff; }
.productions a:visited { color: #551a8b; }
.productions a:link { color: #0000bb; }
.productions table { vertical-align: baseline;
empty-cells: show; }
.productions > table td,
.productions > table th { padding: 2px; }
.productions > table td:first-child,
.productions > table td:last-child {
font-family: "lucida typewriter",
lucidatypewriter,
monospace;
}
/* same as the second selector above, but expressed differently for Opera */
.productions > table td:first-child + td + td {
font-family: "lucida typewriter",
lucidatypewriter,
monospace;
vertical-align: baseline;
}
.productions > table td:first-child + td {
padding-left: 1em;
padding-right: 1em;
}
.productions > table tr { vertical-align: baseline; }
.email { font-family: avantgarde, sans-serif; }
.mailheader { font-family: avantgarde, sans-serif; }
.mimetype { font-family: avantgarde, sans-serif; }
.newsgroup { font-family: avantgarde, sans-serif; }
.url { font-family: avantgarde, sans-serif; }
.file { font-family: avantgarde, sans-serif; }
.guilabel { font-family: avantgarde, sans-serif; }
.realtable { border-collapse: collapse;
border-color: black;
border-style: solid;
border-width: 0px 0px 2px 0px;
empty-cells: show;
margin-left: auto;
margin-right: auto;
padding-left: 0.4em;
padding-right: 0.4em;
}
.realtable tbody { vertical-align: baseline; }
.realtable tfoot { display: table-footer-group; }
.realtable thead { background-color: #99ccff;
border-width: 0px 0px 2px 1px;
display: table-header-group;
font-family: avantgarde, sans-serif;
font-weight: bold;
vertical-align: baseline;
}
.realtable thead :first-child {
border-width: 0px 0px 2px 0px;
}
.realtable thead th { border-width: 0px 0px 2px 1px }
.realtable td,
.realtable th { border-color: black;
border-style: solid;
border-width: 0px 0px 1px 1px;
padding-left: 0.4em;
padding-right: 0.4em;
}
.realtable td:first-child,
.realtable th:first-child {
border-left-width: 0px;
vertical-align: baseline;
}
.center { text-align: center; }
.left { text-align: left; }
.right { text-align: right; }
.refcount-info { font-style: italic; }
.refcount-info .value { font-weight: bold;
color: #006600; }
/*
* Some decoration for the "See also:" blocks, in part inspired by some of
* the styling on Lars Marius Garshol's XSA pages.
* (The blue in the navigation bars is #99CCFF.)
*/
.seealso { background-color: #fffaf0;
border: thin solid black;
padding: 0pt 1em 4pt 1em; }
.seealso > .heading { font-size: 110%;
font-weight: bold; }
/*
* Class 'availability' is used for module availability statements at
* the top of modules.
*/
.availability .platform { font-weight: bold; }
/*
* Additional styles for the distutils package.
*/
.du-command { font-family: monospace; }
.du-option { font-family: avantgarde, sans-serif; }
.du-filevar { font-family: avantgarde, sans-serif;
font-style: italic; }
.du-xxx:before { content: "** ";
font-weight: bold; }
.du-xxx:after { content: " **";
font-weight: bold; }
/*
* Some specialization for printed output.
*/
@media print {
.online-navigation { display: none; }
}

View File

@ -1,82 +0,0 @@
# Generate the Python "info" documentation.
TOPDIR=..
TOOLSDIR=$(TOPDIR)/tools
HTMLDIR=$(TOPDIR)/html
# The emacs binary used to build the info docs. GNU Emacs 21 is required.
EMACS=emacs
MKINFO=$(TOOLSDIR)/mkinfo
SCRIPTS=$(TOOLSDIR)/checkargs.pm $(TOOLSDIR)/mkinfo $(TOOLSDIR)/py2texi.el
# set VERSION to code the VERSION number into the info file name
# allowing installation of more than one set of python info docs
# into the same directory
VERSION=
all: check-emacs-version \
api dist ext mac ref tut whatsnew \
lib
# doc inst
api: python$(VERSION)-api.info
dist: python$(VERSION)-dist.info
doc: python$(VERSION)-doc.info
ext: python$(VERSION)-ext.info
inst: python$(VERSION)-inst.info
lib: python$(VERSION)-lib.info
mac: python$(VERSION)-mac.info
ref: python$(VERSION)-ref.info
tut: python$(VERSION)-tut.info
whatsnew: $(WHATSNEW)
$(WHATSNEW): python$(VERSION)-$(WHATSNEW).info
check-emacs-version:
@v="`$(EMACS) --version 2>&1 | egrep '^(GNU |X)Emacs [12]*'`"; \
if `echo "$$v" | grep '^GNU Emacs 2[12]' >/dev/null 2>&1`; then \
echo "Using $(EMACS) to build the info docs"; \
else \
echo "GNU Emacs 21 or 22 is required to build the info docs"; \
echo "Found $$v"; \
false; \
fi
python$(VERSION)-api.info: ../api/api.tex $(SCRIPTS)
EMACS=$(EMACS) $(MKINFO) $< $*.texi $@
python$(VERSION)-ext.info: ../ext/ext.tex $(SCRIPTS)
EMACS=$(EMACS) $(MKINFO) $< $*.texi $@
python$(VERSION)-lib.info: ../lib/lib.tex $(SCRIPTS)
EMACS=$(EMACS) $(MKINFO) $< $*.texi $@
python$(VERSION)-mac.info: ../mac/mac.tex $(SCRIPTS)
EMACS=$(EMACS) $(MKINFO) $< $*.texi $@
python$(VERSION)-ref.info: ../ref/ref.tex $(SCRIPTS)
EMACS=$(EMACS) $(MKINFO) $< $*.texi $@
python$(VERSION)-tut.info: ../tut/tut.tex $(SCRIPTS)
EMACS=$(EMACS) $(MKINFO) $< $*.texi $@
# Not built by default; the conversion doesn't handle \p and \op
python$(VERSION)-doc.info: ../doc/doc.tex $(SCRIPTS)
EMACS=$(EMACS) $(MKINFO) $< $*.texi $@
python$(VERSION)-dist.info: ../dist/dist.tex $(SCRIPTS)
EMACS=$(EMACS) $(MKINFO) $< $*.texi $@
# Not built by default; the conversion chokes on \installscheme
python$(VERSION)-inst.info: ../inst/inst.tex $(SCRIPTS)
EMACS=$(EMACS) $(MKINFO) $< $*.texi $@
# "whatsnew20" doesn't currently work
python$(VERSION)-$(WHATSNEW).info: ../whatsnew/$(WHATSNEW).tex $(SCRIPTS)
EMACS=$(EMACS) $(MKINFO) $< $*.texi $@
clean:
rm -f *.texi~ *.texi
clobber: clean
rm -f *.texi python*-*.info python*-*.info-[0-9]*

View File

@ -1,21 +0,0 @@
This archive contains the standard Python documentation in GNU info
format. Five manuals are included:
python-ref.info* Python Reference Manual
python-mac.info* Python Macintosh Modules
python-lib.info* Python Library Reference
python-ext.info* Extending and Embedding the Python Interpreter
python-api.info* Python/C API Reference
python-tut.info* Python Tutorial
The file python.dir is a fragment of a "dir" file that can be used to
incorporate these documents into an existing GNU info installation:
insert the contents of this file into the "dir" or "localdir" file at
an appropriate point and copy the python-*.info* files to the same
directory.
Thanks go to Milan Zamazal <pdm@zamazal.org> for providing this
conversion to the info format.
Questions and comments on these documents should be directed to
docs@python.org.

View File

@ -1,11 +0,0 @@
Python Standard Documentation
* What's New: (python-whatsnew25). What's New in Python 2.5?
* Python Library: (python-lib). Python Library Reference
* Python Mac Modules: (python-mac). Python Macintosh Modules
* Python Reference: (python-ref). Python Reference Manual
* Python API: (python-api). Python/C API Reference Manual
* Python Extending: (python-ext). Extending & Embedding Python
* Python Tutorial: (python-tut). Python Tutorial
* Distributing Modules: (python-dist). Distributing Python Modules

File diff suppressed because it is too large Load Diff

View File

@ -1,8 +0,0 @@
\chapter{Data Compression and Archiving}
\label{archiving}
The modules described in this chapter support data compression
with the zlib, gzip, and bzip2 algorithms, and
the creation of ZIP- and tar-format archives.
\localmoduletable

View File

@ -1,283 +0,0 @@
\begin{longtableiii}{lll}{class}{Node type}{Attribute}{Value}
\lineiii{Add}{\member{left}}{left operand}
\lineiii{}{\member{right}}{right operand}
\hline
\lineiii{And}{\member{nodes}}{list of operands}
\hline
\lineiii{AssAttr}{}{\emph{attribute as target of assignment}}
\lineiii{}{\member{expr}}{expression on the left-hand side of the dot}
\lineiii{}{\member{attrname}}{the attribute name, a string}
\lineiii{}{\member{flags}}{XXX}
\hline
\lineiii{AssList}{\member{nodes}}{list of list elements being assigned to}
\hline
\lineiii{AssName}{\member{name}}{name being assigned to}
\lineiii{}{\member{flags}}{XXX}
\hline
\lineiii{AssTuple}{\member{nodes}}{list of tuple elements being assigned to}
\hline
\lineiii{Assert}{\member{test}}{the expression to be tested}
\lineiii{}{\member{fail}}{the value of the \exception{AssertionError}}
\hline
\lineiii{Assign}{\member{nodes}}{a list of assignment targets, one per equal sign}
\lineiii{}{\member{expr}}{the value being assigned}
\hline
\lineiii{AugAssign}{\member{node}}{}
\lineiii{}{\member{op}}{}
\lineiii{}{\member{expr}}{}
\hline
\lineiii{Backquote}{\member{expr}}{}
\hline
\lineiii{Bitand}{\member{nodes}}{}
\hline
\lineiii{Bitor}{\member{nodes}}{}
\hline
\lineiii{Bitxor}{\member{nodes}}{}
\hline
\lineiii{Break}{}{}
\hline
\lineiii{CallFunc}{\member{node}}{expression for the callee}
\lineiii{}{\member{args}}{a list of arguments}
\lineiii{}{\member{star_args}}{the extended *-arg value}
\lineiii{}{\member{dstar_args}}{the extended **-arg value}
\hline
\lineiii{Class}{\member{name}}{the name of the class, a string}
\lineiii{}{\member{bases}}{a list of base classes}
\lineiii{}{\member{doc}}{doc string, a string or \code{None}}
\lineiii{}{\member{code}}{the body of the class statement}
\hline
\lineiii{Compare}{\member{expr}}{}
\lineiii{}{\member{ops}}{}
\hline
\lineiii{Const}{\member{value}}{}
\hline
\lineiii{Continue}{}{}
\hline
\lineiii{Decorators}{\member{nodes}}{List of function decorator expressions}
\hline
\lineiii{Dict}{\member{items}}{}
\hline
\lineiii{Discard}{\member{expr}}{}
\hline
\lineiii{Div}{\member{left}}{}
\lineiii{}{\member{right}}{}
\hline
\lineiii{Ellipsis}{}{}
\hline
\lineiii{Expression}{\member{node}}{}
\lineiii{Exec}{\member{expr}}{}
\lineiii{}{\member{locals}}{}
\lineiii{}{\member{globals}}{}
\hline
\lineiii{FloorDiv}{\member{left}}{}
\lineiii{}{\member{right}}{}
\hline
\lineiii{For}{\member{assign}}{}
\lineiii{}{\member{list}}{}
\lineiii{}{\member{body}}{}
\lineiii{}{\member{else_}}{}
\hline
\lineiii{From}{\member{modname}}{}
\lineiii{}{\member{names}}{}
\hline
\lineiii{Function}{\member{decorators}}{\class{Decorators} or \code{None}}
\lineiii{}{\member{name}}{name used in def, a string}
\lineiii{}{\member{argnames}}{list of argument names, as strings}
\lineiii{}{\member{defaults}}{list of default values}
\lineiii{}{\member{flags}}{xxx}
\lineiii{}{\member{doc}}{doc string, a string or \code{None}}
\lineiii{}{\member{code}}{the body of the function}
\hline
\lineiii{GenExpr}{\member{code}}{}
\hline
\lineiii{GenExprFor}{\member{assign}}{}
\lineiii{}{\member{iter}}{}
\lineiii{}{\member{ifs}}{}
\hline
\lineiii{GenExprIf}{\member{test}}{}
\hline
\lineiii{GenExprInner}{\member{expr}}{}
\lineiii{}{\member{quals}}{}
\hline
\lineiii{Getattr}{\member{expr}}{}
\lineiii{}{\member{attrname}}{}
\hline
\lineiii{Global}{\member{names}}{}
\hline
\lineiii{If}{\member{tests}}{}
\lineiii{}{\member{else_}}{}
\hline
\lineiii{Import}{\member{names}}{}
\hline
\lineiii{Invert}{\member{expr}}{}
\hline
\lineiii{Keyword}{\member{name}}{}
\lineiii{}{\member{expr}}{}
\hline
\lineiii{Lambda}{\member{argnames}}{}
\lineiii{}{\member{defaults}}{}
\lineiii{}{\member{flags}}{}
\lineiii{}{\member{code}}{}
\hline
\lineiii{LeftShift}{\member{left}}{}
\lineiii{}{\member{right}}{}
\hline
\lineiii{List}{\member{nodes}}{}
\hline
\lineiii{ListComp}{\member{expr}}{}
\lineiii{}{\member{quals}}{}
\hline
\lineiii{ListCompFor}{\member{assign}}{}
\lineiii{}{\member{list}}{}
\lineiii{}{\member{ifs}}{}
\hline
\lineiii{ListCompIf}{\member{test}}{}
\hline
\lineiii{Mod}{\member{left}}{}
\lineiii{}{\member{right}}{}
\hline
\lineiii{Module}{\member{doc}}{doc string, a string or \code{None}}
\lineiii{}{\member{node}}{body of the module, a \class{Stmt}}
\hline
\lineiii{Mul}{\member{left}}{}
\lineiii{}{\member{right}}{}
\hline
\lineiii{Name}{\member{name}}{}
\hline
\lineiii{Not}{\member{expr}}{}
\hline
\lineiii{Or}{\member{nodes}}{}
\hline
\lineiii{Pass}{}{}
\hline
\lineiii{Power}{\member{left}}{}
\lineiii{}{\member{right}}{}
\hline
\lineiii{Print}{\member{nodes}}{}
\lineiii{}{\member{dest}}{}
\hline
\lineiii{Printnl}{\member{nodes}}{}
\lineiii{}{\member{dest}}{}
\hline
\lineiii{Raise}{\member{expr1}}{}
\lineiii{}{\member{expr2}}{}
\lineiii{}{\member{expr3}}{}
\hline
\lineiii{Return}{\member{value}}{}
\hline
\lineiii{RightShift}{\member{left}}{}
\lineiii{}{\member{right}}{}
\hline
\lineiii{Slice}{\member{expr}}{}
\lineiii{}{\member{flags}}{}
\lineiii{}{\member{lower}}{}
\lineiii{}{\member{upper}}{}
\hline
\lineiii{Sliceobj}{\member{nodes}}{list of statements}
\hline
\lineiii{Stmt}{\member{nodes}}{}
\hline
\lineiii{Sub}{\member{left}}{}
\lineiii{}{\member{right}}{}
\hline
\lineiii{Subscript}{\member{expr}}{}
\lineiii{}{\member{flags}}{}
\lineiii{}{\member{subs}}{}
\hline
\lineiii{TryExcept}{\member{body}}{}
\lineiii{}{\member{handlers}}{}
\lineiii{}{\member{else_}}{}
\hline
\lineiii{TryFinally}{\member{body}}{}
\lineiii{}{\member{final}}{}
\hline
\lineiii{Tuple}{\member{nodes}}{}
\hline
\lineiii{UnaryAdd}{\member{expr}}{}
\hline
\lineiii{UnarySub}{\member{expr}}{}
\hline
\lineiii{While}{\member{test}}{}
\lineiii{}{\member{body}}{}
\lineiii{}{\member{else_}}{}
\hline
\lineiii{With}{\member{expr}}{}
\lineiii{}{\member{vars}}{}
\lineiii{}{\member{body}}{}
\hline
\lineiii{Yield}{\member{value}}{}
\hline
\end{longtableiii}

View File

@ -1,60 +0,0 @@
from optparse import Option, OptionParser, _match_abbrev
# This case-insensitive option parser relies on having a
# case-insensitive dictionary type available. Here's one
# for Python 2.2. Note that a *real* case-insensitive
# dictionary type would also have to implement __new__(),
# update(), and setdefault() -- but that's not the point
# of this exercise.
class caseless_dict (dict):
def __setitem__ (self, key, value):
dict.__setitem__(self, key.lower(), value)
def __getitem__ (self, key):
return dict.__getitem__(self, key.lower())
def get (self, key, default=None):
return dict.get(self, key.lower())
def has_key (self, key):
return dict.has_key(self, key.lower())
class CaselessOptionParser (OptionParser):
def _create_option_list (self):
self.option_list = []
self._short_opt = caseless_dict()
self._long_opt = caseless_dict()
self._long_opts = []
self.defaults = {}
def _match_long_opt (self, opt):
return _match_abbrev(opt.lower(), self._long_opt.keys())
if __name__ == "__main__":
from optik.errors import OptionConflictError
# test 1: no options to start with
parser = CaselessOptionParser()
try:
parser.add_option("-H", dest="blah")
except OptionConflictError:
print "ok: got OptionConflictError for -H"
else:
print "not ok: no conflict between -h and -H"
parser.add_option("-f", "--file", dest="file")
#print repr(parser.get_option("-f"))
#print repr(parser.get_option("-F"))
#print repr(parser.get_option("--file"))
#print repr(parser.get_option("--fIlE"))
(options, args) = parser.parse_args(["--FiLe", "foo"])
assert options.file == "foo", options.file
print "ok: case insensitive long options work"
(options, args) = parser.parse_args(["-F", "bar"])
assert options.file == "bar", options.file
print "ok: case insensitive short options work"

View File

@ -1,353 +0,0 @@
\chapter{Python compiler package \label{compiler}}
\sectionauthor{Jeremy Hylton}{jeremy@zope.com}
The Python compiler package is a tool for analyzing Python source code
and generating Python bytecode. The compiler contains libraries to
generate an abstract syntax tree from Python source code and to
generate Python bytecode from the tree.
The \refmodule{compiler} package is a Python source to bytecode
translator written in Python. It uses the built-in parser and
standard \refmodule{parser} module to generated a concrete syntax
tree. This tree is used to generate an abstract syntax tree (AST) and
then Python bytecode.
The full functionality of the package duplicates the builtin compiler
provided with the Python interpreter. It is intended to match its
behavior almost exactly. Why implement another compiler that does the
same thing? The package is useful for a variety of purposes. It can
be modified more easily than the builtin compiler. The AST it
generates is useful for analyzing Python source code.
This chapter explains how the various components of the
\refmodule{compiler} package work. It blends reference material with
a tutorial.
The following modules are part of the \refmodule{compiler} package:
\localmoduletable
\section{The basic interface}
\declaremodule{}{compiler}
The top-level of the package defines four functions. If you import
\module{compiler}, you will get these functions and a collection of
modules contained in the package.
\begin{funcdesc}{parse}{buf}
Returns an abstract syntax tree for the Python source code in \var{buf}.
The function raises \exception{SyntaxError} if there is an error in the
source code. The return value is a \class{compiler.ast.Module} instance
that contains the tree.
\end{funcdesc}
\begin{funcdesc}{parseFile}{path}
Return an abstract syntax tree for the Python source code in the file
specified by \var{path}. It is equivalent to
\code{parse(open(\var{path}).read())}.
\end{funcdesc}
\begin{funcdesc}{walk}{ast, visitor\optional{, verbose}}
Do a pre-order walk over the abstract syntax tree \var{ast}. Call the
appropriate method on the \var{visitor} instance for each node
encountered.
\end{funcdesc}
\begin{funcdesc}{compile}{source, filename, mode, flags=None,
dont_inherit=None}
Compile the string \var{source}, a Python module, statement or
expression, into a code object that can be executed by the exec
statement or \function{eval()}. This function is a replacement for the
built-in \function{compile()} function.
The \var{filename} will be used for run-time error messages.
The \var{mode} must be 'exec' to compile a module, 'single' to compile a
single (interactive) statement, or 'eval' to compile an expression.
The \var{flags} and \var{dont_inherit} arguments affect future-related
statements, but are not supported yet.
\end{funcdesc}
\begin{funcdesc}{compileFile}{source}
Compiles the file \var{source} and generates a .pyc file.
\end{funcdesc}
The \module{compiler} package contains the following modules:
\refmodule[compiler.ast]{ast}, \module{consts}, \module{future},
\module{misc}, \module{pyassem}, \module{pycodegen}, \module{symbols},
\module{transformer}, and \refmodule[compiler.visitor]{visitor}.
\section{Limitations}
There are some problems with the error checking of the compiler
package. The interpreter detects syntax errors in two distinct
phases. One set of errors is detected by the interpreter's parser,
the other set by the compiler. The compiler package relies on the
interpreter's parser, so it get the first phases of error checking for
free. It implements the second phase itself, and that implementation is
incomplete. For example, the compiler package does not raise an error
if a name appears more than once in an argument list:
\code{def f(x, x): ...}
A future version of the compiler should fix these problems.
\section{Python Abstract Syntax}
The \module{compiler.ast} module defines an abstract syntax for
Python. In the abstract syntax tree, each node represents a syntactic
construct. The root of the tree is \class{Module} object.
The abstract syntax offers a higher level interface to parsed Python
source code. The \refmodule{parser}
module and the compiler written in C for the Python interpreter use a
concrete syntax tree. The concrete syntax is tied closely to the
grammar description used for the Python parser. Instead of a single
node for a construct, there are often several levels of nested nodes
that are introduced by Python's precedence rules.
The abstract syntax tree is created by the
\module{compiler.transformer} module. The transformer relies on the
builtin Python parser to generate a concrete syntax tree. It
generates an abstract syntax tree from the concrete tree.
The \module{transformer} module was created by Greg
Stein\index{Stein, Greg} and Bill Tutt\index{Tutt, Bill} for an
experimental Python-to-C compiler. The current version contains a
number of modifications and improvements, but the basic form of the
abstract syntax and of the transformer are due to Stein and Tutt.
\subsection{AST Nodes}
\declaremodule{}{compiler.ast}
The \module{compiler.ast} module is generated from a text file that
describes each node type and its elements. Each node type is
represented as a class that inherits from the abstract base class
\class{compiler.ast.Node} and defines a set of named attributes for
child nodes.
\begin{classdesc}{Node}{}
The \class{Node} instances are created automatically by the parser
generator. The recommended interface for specific \class{Node}
instances is to use the public attributes to access child nodes. A
public attribute may be bound to a single node or to a sequence of
nodes, depending on the \class{Node} type. For example, the
\member{bases} attribute of the \class{Class} node, is bound to a
list of base class nodes, and the \member{doc} attribute is bound to
a single node.
Each \class{Node} instance has a \member{lineno} attribute which may
be \code{None}. XXX Not sure what the rules are for which nodes
will have a useful lineno.
\end{classdesc}
All \class{Node} objects offer the following methods:
\begin{methoddesc}{getChildren}{}
Returns a flattened list of the child nodes and objects in the
order they occur. Specifically, the order of the nodes is the
order in which they appear in the Python grammar. Not all of the
children are \class{Node} instances. The names of functions and
classes, for example, are plain strings.
\end{methoddesc}
\begin{methoddesc}{getChildNodes}{}
Returns a flattened list of the child nodes in the order they
occur. This method is like \method{getChildren()}, except that it
only returns those children that are \class{Node} instances.
\end{methoddesc}
Two examples illustrate the general structure of \class{Node}
classes. The \keyword{while} statement is defined by the following
grammar production:
\begin{verbatim}
while_stmt: "while" expression ":" suite
["else" ":" suite]
\end{verbatim}
The \class{While} node has three attributes: \member{test},
\member{body}, and \member{else_}. (If the natural name for an
attribute is also a Python reserved word, it can't be used as an
attribute name. An underscore is appended to the word to make it a
legal identifier, hence \member{else_} instead of \keyword{else}.)
The \keyword{if} statement is more complicated because it can include
several tests.
\begin{verbatim}
if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
\end{verbatim}
The \class{If} node only defines two attributes: \member{tests} and
\member{else_}. The \member{tests} attribute is a sequence of test
expression, consequent body pairs. There is one pair for each
\keyword{if}/\keyword{elif} clause. The first element of the pair is
the test expression. The second elements is a \class{Stmt} node that
contains the code to execute if the test is true.
The \method{getChildren()} method of \class{If} returns a flat list of
child nodes. If there are three \keyword{if}/\keyword{elif} clauses
and no \keyword{else} clause, then \method{getChildren()} will return
a list of six elements: the first test expression, the first
\class{Stmt}, the second text expression, etc.
The following table lists each of the \class{Node} subclasses defined
in \module{compiler.ast} and each of the public attributes available
on their instances. The values of most of the attributes are
themselves \class{Node} instances or sequences of instances. When the
value is something other than an instance, the type is noted in the
comment. The attributes are listed in the order in which they are
returned by \method{getChildren()} and \method{getChildNodes()}.
\input{asttable}
\subsection{Assignment nodes}
There is a collection of nodes used to represent assignments. Each
assignment statement in the source code becomes a single
\class{Assign} node in the AST. The \member{nodes} attribute is a
list that contains a node for each assignment target. This is
necessary because assignment can be chained, e.g. \code{a = b = 2}.
Each \class{Node} in the list will be one of the following classes:
\class{AssAttr}, \class{AssList}, \class{AssName}, or
\class{AssTuple}.
Each target assignment node will describe the kind of object being
assigned to: \class{AssName} for a simple name, e.g. \code{a = 1}.
\class{AssAttr} for an attribute assigned, e.g. \code{a.x = 1}.
\class{AssList} and \class{AssTuple} for list and tuple expansion
respectively, e.g. \code{a, b, c = a_tuple}.
The target assignment nodes also have a \member{flags} attribute that
indicates whether the node is being used for assignment or in a delete
statement. The \class{AssName} is also used to represent a delete
statement, e.g. \class{del x}.
When an expression contains several attribute references, an
assignment or delete statement will contain only one \class{AssAttr}
node -- for the final attribute reference. The other attribute
references will be represented as \class{Getattr} nodes in the
\member{expr} attribute of the \class{AssAttr} instance.
\subsection{Examples}
This section shows several simple examples of ASTs for Python source
code. The examples demonstrate how to use the \function{parse()}
function, what the repr of an AST looks like, and how to access
attributes of an AST node.
The first module defines a single function. Assume it is stored in
\file{/tmp/doublelib.py}.
\begin{verbatim}
"""This is an example module.
This is the docstring.
"""
def double(x):
"Return twice the argument"
return x * 2
\end{verbatim}
In the interactive interpreter session below, I have reformatted the
long AST reprs for readability. The AST reprs use unqualified class
names. If you want to create an instance from a repr, you must import
the class names from the \module{compiler.ast} module.
\begin{verbatim}
>>> import compiler
>>> mod = compiler.parseFile("/tmp/doublelib.py")
>>> mod
Module('This is an example module.\n\nThis is the docstring.\n',
Stmt([Function(None, 'double', ['x'], [], 0,
'Return twice the argument',
Stmt([Return(Mul((Name('x'), Const(2))))]))]))
>>> from compiler.ast import *
>>> Module('This is an example module.\n\nThis is the docstring.\n',
... Stmt([Function(None, 'double', ['x'], [], 0,
... 'Return twice the argument',
... Stmt([Return(Mul((Name('x'), Const(2))))]))]))
Module('This is an example module.\n\nThis is the docstring.\n',
Stmt([Function(None, 'double', ['x'], [], 0,
'Return twice the argument',
Stmt([Return(Mul((Name('x'), Const(2))))]))]))
>>> mod.doc
'This is an example module.\n\nThis is the docstring.\n'
>>> for node in mod.node.nodes:
... print node
...
Function(None, 'double', ['x'], [], 0, 'Return twice the argument',
Stmt([Return(Mul((Name('x'), Const(2))))]))
>>> func = mod.node.nodes[0]
>>> func.code
Stmt([Return(Mul((Name('x'), Const(2))))])
\end{verbatim}
\section{Using Visitors to Walk ASTs}
\declaremodule{}{compiler.visitor}
The visitor pattern is ... The \refmodule{compiler} package uses a
variant on the visitor pattern that takes advantage of Python's
introspection features to eliminate the need for much of the visitor's
infrastructure.
The classes being visited do not need to be programmed to accept
visitors. The visitor need only define visit methods for classes it
is specifically interested in; a default visit method can handle the
rest.
XXX The magic \method{visit()} method for visitors.
\begin{funcdesc}{walk}{tree, visitor\optional{, verbose}}
\end{funcdesc}
\begin{classdesc}{ASTVisitor}{}
The \class{ASTVisitor} is responsible for walking over the tree in the
correct order. A walk begins with a call to \method{preorder()}. For
each node, it checks the \var{visitor} argument to \method{preorder()}
for a method named `visitNodeType,' where NodeType is the name of the
node's class, e.g. for a \class{While} node a \method{visitWhile()}
would be called. If the method exists, it is called with the node as
its first argument.
The visitor method for a particular node type can control how child
nodes are visited during the walk. The \class{ASTVisitor} modifies
the visitor argument by adding a visit method to the visitor; this
method can be used to visit a particular child node. If no visitor is
found for a particular node type, the \method{default()} method is
called.
\end{classdesc}
\class{ASTVisitor} objects have the following methods:
XXX describe extra arguments
\begin{methoddesc}{default}{node\optional{, \moreargs}}
\end{methoddesc}
\begin{methoddesc}{dispatch}{node\optional{, \moreargs}}
\end{methoddesc}
\begin{methoddesc}{preorder}{tree, visitor}
\end{methoddesc}
\section{Bytecode Generation}
The code generator is a visitor that emits bytecodes. Each visit method
can call the \method{emit()} method to emit a new bytecode. The basic
code generator is specialized for modules, classes, and functions. An
assembler converts that emitted instructions to the low-level bytecode
format. It handles things like generator of constant lists of code
objects and calculation of jump offsets.

View File

@ -1,13 +0,0 @@
\chapter{Custom Python Interpreters}
\label{custominterp}
The modules described in this chapter allow writing interfaces similar
to Python's interactive interpreter. If you want a Python interpreter
that supports some special feature in addition to the Python language,
you should look at the \module{code} module. (The \module{codeop}
module is lower-level, used to support compiling a possibly-incomplete
chunk of Python code.)
The full list of modules described in this chapter is:
\localmoduletable

View File

@ -1,10 +0,0 @@
\chapter{Data Types}
\label{datatypes}
The modules described in this chapter provide a variety of specialized
data types such as dates and times, fixed-type arrays, heap queues,
synchronized queues, and sets.
The following modules are documented in this chapter:
\localmoduletable

View File

@ -1,13 +0,0 @@
\chapter{Development Tools}
\label{development}
The modules described in this chapter help you write software. For
example, the \module{pydoc} module takes a module and generates
documentation based on the module's contents. The \module{doctest}
and \module{unittest} modules contains frameworks for writing unit tests
that automatically exercise code and verify that the expected output
is produced.
The list of modules described in this chapter is:
\localmoduletable

View File

@ -1,38 +0,0 @@
\section{\module{distutils} ---
Building and installing Python modules}
\declaremodule{standard}{distutils}
\modulesynopsis{Support for building and installing Python modules
into an existing Python installation.}
\sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
The \module{distutils} package provides support for building and
installing additional modules into a Python installation. The new
modules may be either 100\%{}-pure Python, or may be extension modules
written in C, or may be collections of Python packages which include
modules coded in both Python and C.
This package is discussed in two separate documents which are included
in the Python documentation package. To learn about distributing new
modules using the \module{distutils} facilities, read
\citetitle[../dist/dist.html]{Distributing Python Modules}; this
includes documentation needed to extend distutils. To learn
about installing Python modules, whether or not the author made use of
the \module{distutils} package, read
\citetitle[../inst/inst.html]{Installing Python Modules}.
\begin{seealso}
\seetitle[../dist/dist.html]{Distributing Python Modules}{The manual
for developers and packagers of Python modules. This
describes how to prepare \module{distutils}-based packages
so that they may be easily installed into an existing
Python installation.}
\seetitle[../inst/inst.html]{Installing Python Modules}{An
``administrators'' manual which includes information on
installing modules into an existing Python installation.
You do not need to be a Python programmer to read this
manual.}
\end{seealso}

View File

@ -1,115 +0,0 @@
#!/usr/bin/env python
"""Send the contents of a directory as a MIME message."""
import os
import sys
import smtplib
# For guessing MIME type based on file name extension
import mimetypes
from optparse import OptionParser
from email import encoders
from email.message import Message
from email.mime.audio import MIMEAudio
from email.mime.base import MIMEBase
from email.mime.image import MIMEImage
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
COMMASPACE = ', '
def main():
parser = OptionParser(usage="""\
Send the contents of a directory as a MIME message.
Usage: %prog [options]
Unless the -o option is given, the email is sent by forwarding to your local
SMTP server, which then does the normal delivery process. Your local machine
must be running an SMTP server.
""")
parser.add_option('-d', '--directory',
type='string', action='store',
help="""Mail the contents of the specified directory,
otherwise use the current directory. Only the regular
files in the directory are sent, and we don't recurse to
subdirectories.""")
parser.add_option('-o', '--output',
type='string', action='store', metavar='FILE',
help="""Print the composed message to FILE instead of
sending the message to the SMTP server.""")
parser.add_option('-s', '--sender',
type='string', action='store', metavar='SENDER',
help='The value of the From: header (required)')
parser.add_option('-r', '--recipient',
type='string', action='append', metavar='RECIPIENT',
default=[], dest='recipients',
help='A To: header value (at least one required)')
opts, args = parser.parse_args()
if not opts.sender or not opts.recipients:
parser.print_help()
sys.exit(1)
directory = opts.directory
if not directory:
directory = '.'
# Create the enclosing (outer) message
outer = MIMEMultipart()
outer['Subject'] = 'Contents of directory %s' % os.path.abspath(directory)
outer['To'] = COMMASPACE.join(opts.recipients)
outer['From'] = opts.sender
outer.preamble = 'You will not see this in a MIME-aware mail reader.\n'
for filename in os.listdir(directory):
path = os.path.join(directory, filename)
if not os.path.isfile(path):
continue
# Guess the content type based on the file's extension. Encoding
# will be ignored, although we should check for simple things like
# gzip'd or compressed files.
ctype, encoding = mimetypes.guess_type(path)
if ctype is None or encoding is not None:
# No guess could be made, or the file is encoded (compressed), so
# use a generic bag-of-bits type.
ctype = 'application/octet-stream'
maintype, subtype = ctype.split('/', 1)
if maintype == 'text':
fp = open(path)
# Note: we should handle calculating the charset
msg = MIMEText(fp.read(), _subtype=subtype)
fp.close()
elif maintype == 'image':
fp = open(path, 'rb')
msg = MIMEImage(fp.read(), _subtype=subtype)
fp.close()
elif maintype == 'audio':
fp = open(path, 'rb')
msg = MIMEAudio(fp.read(), _subtype=subtype)
fp.close()
else:
fp = open(path, 'rb')
msg = MIMEBase(maintype, subtype)
msg.set_payload(fp.read())
fp.close()
# Encode the payload using Base64
encoders.encode_base64(msg)
# Set the filename parameter
msg.add_header('Content-Disposition', 'attachment', filename=filename)
outer.attach(msg)
# Now send or store the message
composed = outer.as_string()
if opts.output:
fp = open(opts.output, 'w')
fp.write(composed)
fp.close()
else:
s = smtplib.SMTP()
s.connect()
s.sendmail(opts.sender, opts.recipients, composed)
s.close()
if __name__ == '__main__':
main()

View File

@ -1,32 +0,0 @@
# Import smtplib for the actual sending function
import smtplib
# Here are the email package modules we'll need
from email.mime.image import MIMEImage
from email.mime.multipart import MIMEMultipart
COMMASPACE = ', '
# Create the container (outer) email message.
msg = MIMEMultipart()
msg['Subject'] = 'Our family reunion'
# me == the sender's email address
# family = the list of all recipients' email addresses
msg['From'] = me
msg['To'] = COMMASPACE.join(family)
msg.preamble = 'Our family reunion'
# Assume we know that the image files are all in PNG format
for file in pngfiles:
# Open the files in binary mode. Let the MIMEImage class automatically
# guess the specific image type.
fp = open(file, 'rb')
img = MIMEImage(fp.read())
fp.close()
msg.attach(img)
# Send the email via our own SMTP server.
s = smtplib.SMTP()
s.connect()
s.sendmail(me, family, msg.as_string())
s.close()

View File

@ -1,25 +0,0 @@
# Import smtplib for the actual sending function
import smtplib
# Import the email modules we'll need
from email.mime.text import MIMEText
# Open a plain text file for reading. For this example, assume that
# the text file contains only ASCII characters.
fp = open(textfile, 'rb')
# Create a text/plain message
msg = MIMEText(fp.read())
fp.close()
# me == the sender's email address
# you == the recipient's email address
msg['Subject'] = 'The contents of %s' % textfile
msg['From'] = me
msg['To'] = you
# Send the message via our own SMTP server, but don't include the
# envelope header.
s = smtplib.SMTP()
s.connect()
s.sendmail(me, [you], msg.as_string())
s.close()

View File

@ -1,68 +0,0 @@
#!/usr/bin/env python
"""Unpack a MIME message into a directory of files."""
import os
import sys
import email
import errno
import mimetypes
from optparse import OptionParser
def main():
parser = OptionParser(usage="""\
Unpack a MIME message into a directory of files.
Usage: %prog [options] msgfile
""")
parser.add_option('-d', '--directory',
type='string', action='store',
help="""Unpack the MIME message into the named
directory, which will be created if it doesn't already
exist.""")
opts, args = parser.parse_args()
if not opts.directory:
parser.print_help()
sys.exit(1)
try:
msgfile = args[0]
except IndexError:
parser.print_help()
sys.exit(1)
try:
os.mkdir(opts.directory)
except OSError, e:
# Ignore directory exists error
if e.errno <> errno.EEXIST:
raise
fp = open(msgfile)
msg = email.message_from_file(fp)
fp.close()
counter = 1
for part in msg.walk():
# multipart/* are just containers
if part.get_content_maintype() == 'multipart':
continue
# Applications should really sanitize the given filename so that an
# email message can't be used to overwrite important files
filename = part.get_filename()
if not filename:
ext = mimetypes.guess_extension(part.get_type())
if not ext:
# Use a generic bag-of-bits extension
ext = '.bin'
filename = 'part-%03d%s' % (counter, ext)
counter += 1
fp = open(os.path.join(opts.directory, filename), 'wb')
fp.write(part.get_payload(decode=True))
fp.close()
if __name__ == '__main__':
main()

View File

@ -1,402 +0,0 @@
% Copyright (C) 2001-2007 Python Software Foundation
% Author: barry@python.org (Barry Warsaw)
\section{\module{email} ---
An email and MIME handling package}
\declaremodule{standard}{email}
\modulesynopsis{Package supporting the parsing, manipulating, and
generating email messages, including MIME documents.}
\moduleauthor{Barry A. Warsaw}{barry@python.org}
\sectionauthor{Barry A. Warsaw}{barry@python.org}
\versionadded{2.2}
The \module{email} package is a library for managing email messages,
including MIME and other \rfc{2822}-based message documents. It
subsumes most of the functionality in several older standard modules
such as \refmodule{rfc822}, \refmodule{mimetools},
\refmodule{multifile}, and other non-standard packages such as
\module{mimecntl}. It is specifically \emph{not} designed to do any
sending of email messages to SMTP (\rfc{2821}), NNTP, or other servers; those
are functions of modules such as \refmodule{smtplib} and \refmodule{nntplib}.
The \module{email} package attempts to be as RFC-compliant as possible,
supporting in addition to \rfc{2822}, such MIME-related RFCs as
\rfc{2045}, \rfc{2046}, \rfc{2047}, and \rfc{2231}.
The primary distinguishing feature of the \module{email} package is
that it splits the parsing and generating of email messages from the
internal \emph{object model} representation of email. Applications
using the \module{email} package deal primarily with objects; you can
add sub-objects to messages, remove sub-objects from messages,
completely re-arrange the contents, etc. There is a separate parser
and a separate generator which handles the transformation from flat
text to the object model, and then back to flat text again. There
are also handy subclasses for some common MIME object types, and a few
miscellaneous utilities that help with such common tasks as extracting
and parsing message field values, creating RFC-compliant dates, etc.
The following sections describe the functionality of the
\module{email} package. The ordering follows a progression that
should be common in applications: an email message is read as flat
text from a file or other source, the text is parsed to produce the
object structure of the email message, this structure is manipulated,
and finally, the object tree is rendered back into flat text.
It is perfectly feasible to create the object structure out of whole
cloth --- i.e. completely from scratch. From there, a similar
progression can be taken as above.
Also included are detailed specifications of all the classes and
modules that the \module{email} package provides, the exception
classes you might encounter while using the \module{email} package,
some auxiliary utilities, and a few examples. For users of the older
\module{mimelib} package, or previous versions of the \module{email}
package, a section on differences and porting is provided.
\begin{seealso}
\seemodule{smtplib}{SMTP protocol client}
\seemodule{nntplib}{NNTP protocol client}
\end{seealso}
\subsection{Representing an email message}
\input{emailmessage}
\subsection{Parsing email messages}
\input{emailparser}
\subsection{Generating MIME documents}
\input{emailgenerator}
\subsection{Creating email and MIME objects from scratch}
\input{emailmimebase}
\subsection{Internationalized headers}
\input{emailheaders}
\subsection{Representing character sets}
\input{emailcharsets}
\subsection{Encoders}
\input{emailencoders}
\subsection{Exception and Defect classes}
\input{emailexc}
\subsection{Miscellaneous utilities}
\input{emailutil}
\subsection{Iterators}
\input{emailiter}
\subsection{Package History\label{email-pkg-history}}
This table describes the release history of the email package, corresponding
to the version of Python that the package was released with. For purposes of
this document, when you see a note about change or added versions, these refer
to the Python version the change was made in, \emph{not} the email package
version. This table also describes the Python compatibility of each version
of the package.
\begin{tableiii}{l|l|l}{constant}{email version}{distributed with}{compatible with}
\lineiii{1.x}{Python 2.2.0 to Python 2.2.1}{\emph{no longer supported}}
\lineiii{2.5}{Python 2.2.2+ and Python 2.3}{Python 2.1 to 2.5}
\lineiii{3.0}{Python 2.4}{Python 2.3 to 2.5}
\lineiii{4.0}{Python 2.5}{Python 2.3 to 2.5}
\end{tableiii}
Here are the major differences between \module{email} version 4 and version 3:
\begin{itemize}
\item All modules have been renamed according to \pep{8} standards. For
example, the version 3 module \module{email.Message} was renamed to
\module{email.message} in version 4.
\item A new subpackage \module{email.mime} was added and all the version 3
\module{email.MIME*} modules were renamed and situated into the
\module{email.mime} subpackage. For example, the version 3 module
\module{email.MIMEText} was renamed to \module{email.mime.text}.
\emph{Note that the version 3 names will continue to work until Python
2.6}.
\item The \module{email.mime.application} module was added, which contains the
\class{MIMEApplication} class.
\item Methods that were deprecated in version 3 have been removed. These
include \method{Generator.__call__()}, \method{Message.get_type()},
\method{Message.get_main_type()}, \method{Message.get_subtype()}.
\item Fixes have been added for \rfc{2231} support which can change some of
the return types for \function{Message.get_param()} and friends. Under
some circumstances, values which used to return a 3-tuple now return
simple strings (specifically, if all extended parameter segments were
unencoded, there is no language and charset designation expected, so the
return type is now a simple string). Also, \%-decoding used to be done
for both encoded and unencoded segments; this decoding is now done only
for encoded segments.
\end{itemize}
Here are the major differences between \module{email} version 3 and version 2:
\begin{itemize}
\item The \class{FeedParser} class was introduced, and the \class{Parser}
class was implemented in terms of the \class{FeedParser}. All parsing
therefore is non-strict, and parsing will make a best effort never to
raise an exception. Problems found while parsing messages are stored in
the message's \var{defect} attribute.
\item All aspects of the API which raised \exception{DeprecationWarning}s in
version 2 have been removed. These include the \var{_encoder} argument
to the \class{MIMEText} constructor, the \method{Message.add_payload()}
method, the \function{Utils.dump_address_pair()} function, and the
functions \function{Utils.decode()} and \function{Utils.encode()}.
\item New \exception{DeprecationWarning}s have been added to:
\method{Generator.__call__()}, \method{Message.get_type()},
\method{Message.get_main_type()}, \method{Message.get_subtype()}, and
the \var{strict} argument to the \class{Parser} class. These are
expected to be removed in future versions.
\item Support for Pythons earlier than 2.3 has been removed.
\end{itemize}
Here are the differences between \module{email} version 2 and version 1:
\begin{itemize}
\item The \module{email.Header} and \module{email.Charset} modules
have been added.
\item The pickle format for \class{Message} instances has changed.
Since this was never (and still isn't) formally defined, this
isn't considered a backward incompatibility. However if your
application pickles and unpickles \class{Message} instances, be
aware that in \module{email} version 2, \class{Message}
instances now have private variables \var{_charset} and
\var{_default_type}.
\item Several methods in the \class{Message} class have been
deprecated, or their signatures changed. Also, many new methods
have been added. See the documentation for the \class{Message}
class for details. The changes should be completely backward
compatible.
\item The object structure has changed in the face of
\mimetype{message/rfc822} content types. In \module{email}
version 1, such a type would be represented by a scalar payload,
i.e. the container message's \method{is_multipart()} returned
false, \method{get_payload()} was not a list object, but a single
\class{Message} instance.
This structure was inconsistent with the rest of the package, so
the object representation for \mimetype{message/rfc822} content
types was changed. In \module{email} version 2, the container
\emph{does} return \code{True} from \method{is_multipart()}, and
\method{get_payload()} returns a list containing a single
\class{Message} item.
Note that this is one place that backward compatibility could
not be completely maintained. However, if you're already
testing the return type of \method{get_payload()}, you should be
fine. You just need to make sure your code doesn't do a
\method{set_payload()} with a \class{Message} instance on a
container with a content type of \mimetype{message/rfc822}.
\item The \class{Parser} constructor's \var{strict} argument was
added, and its \method{parse()} and \method{parsestr()} methods
grew a \var{headersonly} argument. The \var{strict} flag was
also added to functions \function{email.message_from_file()}
and \function{email.message_from_string()}.
\item \method{Generator.__call__()} is deprecated; use
\method{Generator.flatten()} instead. The \class{Generator}
class has also grown the \method{clone()} method.
\item The \class{DecodedGenerator} class in the
\module{email.Generator} module was added.
\item The intermediate base classes \class{MIMENonMultipart} and
\class{MIMEMultipart} have been added, and interposed in the
class hierarchy for most of the other MIME-related derived
classes.
\item The \var{_encoder} argument to the \class{MIMEText} constructor
has been deprecated. Encoding now happens implicitly based
on the \var{_charset} argument.
\item The following functions in the \module{email.Utils} module have
been deprecated: \function{dump_address_pairs()},
\function{decode()}, and \function{encode()}. The following
functions have been added to the module:
\function{make_msgid()}, \function{decode_rfc2231()},
\function{encode_rfc2231()}, and \function{decode_params()}.
\item The non-public function \function{email.Iterators._structure()}
was added.
\end{itemize}
\subsection{Differences from \module{mimelib}}
The \module{email} package was originally prototyped as a separate
library called
\ulink{\texttt{mimelib}}{http://mimelib.sf.net/}.
Changes have been made so that
method names are more consistent, and some methods or modules have
either been added or removed. The semantics of some of the methods
have also changed. For the most part, any functionality available in
\module{mimelib} is still available in the \refmodule{email} package,
albeit often in a different way. Backward compatibility between
the \module{mimelib} package and the \module{email} package was not a
priority.
Here is a brief description of the differences between the
\module{mimelib} and the \refmodule{email} packages, along with hints on
how to port your applications.
Of course, the most visible difference between the two packages is
that the package name has been changed to \refmodule{email}. In
addition, the top-level package has the following differences:
\begin{itemize}
\item \function{messageFromString()} has been renamed to
\function{message_from_string()}.
\item \function{messageFromFile()} has been renamed to
\function{message_from_file()}.
\end{itemize}
The \class{Message} class has the following differences:
\begin{itemize}
\item The method \method{asString()} was renamed to \method{as_string()}.
\item The method \method{ismultipart()} was renamed to
\method{is_multipart()}.
\item The \method{get_payload()} method has grown a \var{decode}
optional argument.
\item The method \method{getall()} was renamed to \method{get_all()}.
\item The method \method{addheader()} was renamed to \method{add_header()}.
\item The method \method{gettype()} was renamed to \method{get_type()}.
\item The method \method{getmaintype()} was renamed to
\method{get_main_type()}.
\item The method \method{getsubtype()} was renamed to
\method{get_subtype()}.
\item The method \method{getparams()} was renamed to
\method{get_params()}.
Also, whereas \method{getparams()} returned a list of strings,
\method{get_params()} returns a list of 2-tuples, effectively
the key/value pairs of the parameters, split on the \character{=}
sign.
\item The method \method{getparam()} was renamed to \method{get_param()}.
\item The method \method{getcharsets()} was renamed to
\method{get_charsets()}.
\item The method \method{getfilename()} was renamed to
\method{get_filename()}.
\item The method \method{getboundary()} was renamed to
\method{get_boundary()}.
\item The method \method{setboundary()} was renamed to
\method{set_boundary()}.
\item The method \method{getdecodedpayload()} was removed. To get
similar functionality, pass the value 1 to the \var{decode} flag
of the {get_payload()} method.
\item The method \method{getpayloadastext()} was removed. Similar
functionality
is supported by the \class{DecodedGenerator} class in the
\refmodule{email.generator} module.
\item The method \method{getbodyastext()} was removed. You can get
similar functionality by creating an iterator with
\function{typed_subpart_iterator()} in the
\refmodule{email.iterators} module.
\end{itemize}
The \class{Parser} class has no differences in its public interface.
It does have some additional smarts to recognize
\mimetype{message/delivery-status} type messages, which it represents as
a \class{Message} instance containing separate \class{Message}
subparts for each header block in the delivery status
notification\footnote{Delivery Status Notifications (DSN) are defined
in \rfc{1894}.}.
The \class{Generator} class has no differences in its public
interface. There is a new class in the \refmodule{email.generator}
module though, called \class{DecodedGenerator} which provides most of
the functionality previously available in the
\method{Message.getpayloadastext()} method.
The following modules and classes have been changed:
\begin{itemize}
\item The \class{MIMEBase} class constructor arguments \var{_major}
and \var{_minor} have changed to \var{_maintype} and
\var{_subtype} respectively.
\item The \code{Image} class/module has been renamed to
\code{MIMEImage}. The \var{_minor} argument has been renamed to
\var{_subtype}.
\item The \code{Text} class/module has been renamed to
\code{MIMEText}. The \var{_minor} argument has been renamed to
\var{_subtype}.
\item The \code{MessageRFC822} class/module has been renamed to
\code{MIMEMessage}. Note that an earlier version of
\module{mimelib} called this class/module \code{RFC822}, but
that clashed with the Python standard library module
\refmodule{rfc822} on some case-insensitive file systems.
Also, the \class{MIMEMessage} class now represents any kind of
MIME message with main type \mimetype{message}. It takes an
optional argument \var{_subtype} which is used to set the MIME
subtype. \var{_subtype} defaults to \mimetype{rfc822}.
\end{itemize}
\module{mimelib} provided some utility functions in its
\module{address} and \module{date} modules. All of these functions
have been moved to the \refmodule{email.utils} module.
The \code{MsgReader} class/module has been removed. Its functionality
is most closely supported in the \function{body_line_iterator()}
function in the \refmodule{email.iterators} module.
\subsection{Examples}
Here are a few examples of how to use the \module{email} package to
read, write, and send simple email messages, as well as more complex
MIME messages.
First, let's see how to create and send a simple text message:
\verbatiminput{email-simple.py}
Here's an example of how to send a MIME message containing a bunch of
family pictures that may be residing in a directory:
\verbatiminput{email-mime.py}
Here's an example of how to send the entire contents of a directory as
an email message:
\footnote{Thanks to Matthew Dixon Cowles for the original inspiration
and examples.}
\verbatiminput{email-dir.py}
And finally, here's an example of how to unpack a MIME message like
the one above, into a directory of files:
\verbatiminput{email-unpack.py}

View File

@ -1,244 +0,0 @@
\declaremodule{standard}{email.charset}
\modulesynopsis{Character Sets}
This module provides a class \class{Charset} for representing
character sets and character set conversions in email messages, as
well as a character set registry and several convenience methods for
manipulating this registry. Instances of \class{Charset} are used in
several other modules within the \module{email} package.
Import this class from the \module{email.charset} module.
\versionadded{2.2.2}
\begin{classdesc}{Charset}{\optional{input_charset}}
Map character sets to their email properties.
This class provides information about the requirements imposed on
email for a specific character set. It also provides convenience
routines for converting between character sets, given the availability
of the applicable codecs. Given a character set, it will do its best
to provide information on how to use that character set in an email
message in an RFC-compliant way.
Certain character sets must be encoded with quoted-printable or base64
when used in email headers or bodies. Certain character sets must be
converted outright, and are not allowed in email.
Optional \var{input_charset} is as described below; it is always
coerced to lower case. After being alias normalized it is also used
as a lookup into the registry of character sets to find out the header
encoding, body encoding, and output conversion codec to be used for
the character set. For example, if
\var{input_charset} is \code{iso-8859-1}, then headers and bodies will
be encoded using quoted-printable and no output conversion codec is
necessary. If \var{input_charset} is \code{euc-jp}, then headers will
be encoded with base64, bodies will not be encoded, but output text
will be converted from the \code{euc-jp} character set to the
\code{iso-2022-jp} character set.
\end{classdesc}
\class{Charset} instances have the following data attributes:
\begin{datadesc}{input_charset}
The initial character set specified. Common aliases are converted to
their \emph{official} email names (e.g. \code{latin_1} is converted to
\code{iso-8859-1}). Defaults to 7-bit \code{us-ascii}.
\end{datadesc}
\begin{datadesc}{header_encoding}
If the character set must be encoded before it can be used in an
email header, this attribute will be set to \code{Charset.QP} (for
quoted-printable), \code{Charset.BASE64} (for base64 encoding), or
\code{Charset.SHORTEST} for the shortest of QP or BASE64 encoding.
Otherwise, it will be \code{None}.
\end{datadesc}
\begin{datadesc}{body_encoding}
Same as \var{header_encoding}, but describes the encoding for the
mail message's body, which indeed may be different than the header
encoding. \code{Charset.SHORTEST} is not allowed for
\var{body_encoding}.
\end{datadesc}
\begin{datadesc}{output_charset}
Some character sets must be converted before they can be used in
email headers or bodies. If the \var{input_charset} is one of
them, this attribute will contain the name of the character set
output will be converted to. Otherwise, it will be \code{None}.
\end{datadesc}
\begin{datadesc}{input_codec}
The name of the Python codec used to convert the \var{input_charset} to
Unicode. If no conversion codec is necessary, this attribute will be
\code{None}.
\end{datadesc}
\begin{datadesc}{output_codec}
The name of the Python codec used to convert Unicode to the
\var{output_charset}. If no conversion codec is necessary, this
attribute will have the same value as the \var{input_codec}.
\end{datadesc}
\class{Charset} instances also have the following methods:
\begin{methoddesc}[Charset]{get_body_encoding}{}
Return the content transfer encoding used for body encoding.
This is either the string \samp{quoted-printable} or \samp{base64}
depending on the encoding used, or it is a function, in which case you
should call the function with a single argument, the Message object
being encoded. The function should then set the
\mailheader{Content-Transfer-Encoding} header itself to whatever is
appropriate.
Returns the string \samp{quoted-printable} if
\var{body_encoding} is \code{QP}, returns the string
\samp{base64} if \var{body_encoding} is \code{BASE64}, and returns the
string \samp{7bit} otherwise.
\end{methoddesc}
\begin{methoddesc}{convert}{s}
Convert the string \var{s} from the \var{input_codec} to the
\var{output_codec}.
\end{methoddesc}
\begin{methoddesc}{to_splittable}{s}
Convert a possibly multibyte string to a safely splittable format.
\var{s} is the string to split.
Uses the \var{input_codec} to try and convert the string to Unicode,
so it can be safely split on character boundaries (even for multibyte
characters).
Returns the string as-is if it isn't known how to convert \var{s} to
Unicode with the \var{input_charset}.
Characters that could not be converted to Unicode will be replaced
with the Unicode replacement character \character{U+FFFD}.
\end{methoddesc}
\begin{methoddesc}{from_splittable}{ustr\optional{, to_output}}
Convert a splittable string back into an encoded string. \var{ustr}
is a Unicode string to ``unsplit''.
This method uses the proper codec to try and convert the string from
Unicode back into an encoded format. Return the string as-is if it is
not Unicode, or if it could not be converted from Unicode.
Characters that could not be converted from Unicode will be replaced
with an appropriate character (usually \character{?}).
If \var{to_output} is \code{True} (the default), uses
\var{output_codec} to convert to an
encoded format. If \var{to_output} is \code{False}, it uses
\var{input_codec}.
\end{methoddesc}
\begin{methoddesc}{get_output_charset}{}
Return the output character set.
This is the \var{output_charset} attribute if that is not \code{None},
otherwise it is \var{input_charset}.
\end{methoddesc}
\begin{methoddesc}{encoded_header_len}{}
Return the length of the encoded header string, properly calculating
for quoted-printable or base64 encoding.
\end{methoddesc}
\begin{methoddesc}{header_encode}{s\optional{, convert}}
Header-encode the string \var{s}.
If \var{convert} is \code{True}, the string will be converted from the
input charset to the output charset automatically. This is not useful
for multibyte character sets, which have line length issues (multibyte
characters must be split on a character, not a byte boundary); use the
higher-level \class{Header} class to deal with these issues (see
\refmodule{email.header}). \var{convert} defaults to \code{False}.
The type of encoding (base64 or quoted-printable) will be based on
the \var{header_encoding} attribute.
\end{methoddesc}
\begin{methoddesc}{body_encode}{s\optional{, convert}}
Body-encode the string \var{s}.
If \var{convert} is \code{True} (the default), the string will be
converted from the input charset to output charset automatically.
Unlike \method{header_encode()}, there are no issues with byte
boundaries and multibyte charsets in email bodies, so this is usually
pretty safe.
The type of encoding (base64 or quoted-printable) will be based on
the \var{body_encoding} attribute.
\end{methoddesc}
The \class{Charset} class also provides a number of methods to support
standard operations and built-in functions.
\begin{methoddesc}[Charset]{__str__}{}
Returns \var{input_charset} as a string coerced to lower case.
\method{__repr__()} is an alias for \method{__str__()}.
\end{methoddesc}
\begin{methoddesc}[Charset]{__eq__}{other}
This method allows you to compare two \class{Charset} instances for equality.
\end{methoddesc}
\begin{methoddesc}[Header]{__ne__}{other}
This method allows you to compare two \class{Charset} instances for inequality.
\end{methoddesc}
The \module{email.charset} module also provides the following
functions for adding new entries to the global character set, alias,
and codec registries:
\begin{funcdesc}{add_charset}{charset\optional{, header_enc\optional{,
body_enc\optional{, output_charset}}}}
Add character properties to the global registry.
\var{charset} is the input character set, and must be the canonical
name of a character set.
Optional \var{header_enc} and \var{body_enc} is either
\code{Charset.QP} for quoted-printable, \code{Charset.BASE64} for
base64 encoding, \code{Charset.SHORTEST} for the shortest of
quoted-printable or base64 encoding, or \code{None} for no encoding.
\code{SHORTEST} is only valid for \var{header_enc}. The default is
\code{None} for no encoding.
Optional \var{output_charset} is the character set that the output
should be in. Conversions will proceed from input charset, to
Unicode, to the output charset when the method
\method{Charset.convert()} is called. The default is to output in the
same character set as the input.
Both \var{input_charset} and \var{output_charset} must have Unicode
codec entries in the module's character set-to-codec mapping; use
\function{add_codec()} to add codecs the module does
not know about. See the \refmodule{codecs} module's documentation for
more information.
The global character set registry is kept in the module global
dictionary \code{CHARSETS}.
\end{funcdesc}
\begin{funcdesc}{add_alias}{alias, canonical}
Add a character set alias. \var{alias} is the alias name,
e.g. \code{latin-1}. \var{canonical} is the character set's canonical
name, e.g. \code{iso-8859-1}.
The global charset alias registry is kept in the module global
dictionary \code{ALIASES}.
\end{funcdesc}
\begin{funcdesc}{add_codec}{charset, codecname}
Add a codec that map characters in the given character set to and from
Unicode.
\var{charset} is the canonical name of a character set.
\var{codecname} is the name of a Python codec, as appropriate for the
second argument to the \function{unicode()} built-in, or to the
\method{encode()} method of a Unicode string.
\end{funcdesc}

View File

@ -1,47 +0,0 @@
\declaremodule{standard}{email.encoders}
\modulesynopsis{Encoders for email message payloads.}
When creating \class{Message} objects from scratch, you often need to
encode the payloads for transport through compliant mail servers.
This is especially true for \mimetype{image/*} and \mimetype{text/*}
type messages containing binary data.
The \module{email} package provides some convenient encodings in its
\module{encoders} module. These encoders are actually used by the
\class{MIMEAudio} and \class{MIMEImage} class constructors to provide default
encodings. All encoder functions take exactly one argument, the message
object to encode. They usually extract the payload, encode it, and reset the
payload to this newly encoded value. They should also set the
\mailheader{Content-Transfer-Encoding} header as appropriate.
Here are the encoding functions provided:
\begin{funcdesc}{encode_quopri}{msg}
Encodes the payload into quoted-printable form and sets the
\mailheader{Content-Transfer-Encoding} header to
\code{quoted-printable}\footnote{Note that encoding with
\method{encode_quopri()} also encodes all tabs and space characters in
the data.}.
This is a good encoding to use when most of your payload is normal
printable data, but contains a few unprintable characters.
\end{funcdesc}
\begin{funcdesc}{encode_base64}{msg}
Encodes the payload into base64 form and sets the
\mailheader{Content-Transfer-Encoding} header to
\code{base64}. This is a good encoding to use when most of your payload
is unprintable data since it is a more compact form than
quoted-printable. The drawback of base64 encoding is that it
renders the text non-human readable.
\end{funcdesc}
\begin{funcdesc}{encode_7or8bit}{msg}
This doesn't actually modify the message's payload, but it does set
the \mailheader{Content-Transfer-Encoding} header to either \code{7bit} or
\code{8bit} as appropriate, based on the payload data.
\end{funcdesc}
\begin{funcdesc}{encode_noop}{msg}
This does nothing; it doesn't even set the
\mailheader{Content-Transfer-Encoding} header.
\end{funcdesc}

View File

@ -1,87 +0,0 @@
\declaremodule{standard}{email.errors}
\modulesynopsis{The exception classes used by the email package.}
The following exception classes are defined in the
\module{email.errors} module:
\begin{excclassdesc}{MessageError}{}
This is the base class for all exceptions that the \module{email}
package can raise. It is derived from the standard
\exception{Exception} class and defines no additional methods.
\end{excclassdesc}
\begin{excclassdesc}{MessageParseError}{}
This is the base class for exceptions thrown by the \class{Parser}
class. It is derived from \exception{MessageError}.
\end{excclassdesc}
\begin{excclassdesc}{HeaderParseError}{}
Raised under some error conditions when parsing the \rfc{2822} headers of
a message, this class is derived from \exception{MessageParseError}.
It can be raised from the \method{Parser.parse()} or
\method{Parser.parsestr()} methods.
Situations where it can be raised include finding an envelope
header after the first \rfc{2822} header of the message, finding a
continuation line before the first \rfc{2822} header is found, or finding
a line in the headers which is neither a header or a continuation
line.
\end{excclassdesc}
\begin{excclassdesc}{BoundaryError}{}
Raised under some error conditions when parsing the \rfc{2822} headers of
a message, this class is derived from \exception{MessageParseError}.
It can be raised from the \method{Parser.parse()} or
\method{Parser.parsestr()} methods.
Situations where it can be raised include not being able to find the
starting or terminating boundary in a \mimetype{multipart/*} message
when strict parsing is used.
\end{excclassdesc}
\begin{excclassdesc}{MultipartConversionError}{}
Raised when a payload is added to a \class{Message} object using
\method{add_payload()}, but the payload is already a scalar and the
message's \mailheader{Content-Type} main type is not either
\mimetype{multipart} or missing. \exception{MultipartConversionError}
multiply inherits from \exception{MessageError} and the built-in
\exception{TypeError}.
Since \method{Message.add_payload()} is deprecated, this exception is
rarely raised in practice. However the exception may also be raised
if the \method{attach()} method is called on an instance of a class
derived from \class{MIMENonMultipart} (e.g. \class{MIMEImage}).
\end{excclassdesc}
Here's the list of the defects that the \class{FeedParser} can find while
parsing messages. Note that the defects are added to the message where the
problem was found, so for example, if a message nested inside a
\mimetype{multipart/alternative} had a malformed header, that nested message
object would have a defect, but the containing messages would not.
All defect classes are subclassed from \class{email.errors.MessageDefect}, but
this class is \emph{not} an exception!
\versionadded[All the defect classes were added]{2.4}
\begin{itemize}
\item \class{NoBoundaryInMultipartDefect} -- A message claimed to be a
multipart, but had no \mimetype{boundary} parameter.
\item \class{StartBoundaryNotFoundDefect} -- The start boundary claimed in the
\mailheader{Content-Type} header was never found.
\item \class{FirstHeaderLineIsContinuationDefect} -- The message had a
continuation line as its first header line.
\item \class{MisplacedEnvelopeHeaderDefect} - A ``Unix From'' header was found
in the middle of a header block.
\item \class{MalformedHeaderDefect} -- A header was found that was missing a
colon, or was otherwise malformed.
\item \class{MultipartInvariantViolationDefect} -- A message claimed to be a
\mimetype{multipart}, but no subparts were found. Note that when a
message has this defect, its \method{is_multipart()} method may return
false even though its content type claims to be \mimetype{multipart}.
\end{itemize}

View File

@ -1,133 +0,0 @@
\declaremodule{standard}{email.generator}
\modulesynopsis{Generate flat text email messages from a message structure.}
One of the most common tasks is to generate the flat text of the email
message represented by a message object structure. You will need to do
this if you want to send your message via the \refmodule{smtplib}
module or the \refmodule{nntplib} module, or print the message on the
console. Taking a message object structure and producing a flat text
document is the job of the \class{Generator} class.
Again, as with the \refmodule{email.parser} module, you aren't limited
to the functionality of the bundled generator; you could write one
from scratch yourself. However the bundled generator knows how to
generate most email in a standards-compliant way, should handle MIME
and non-MIME email messages just fine, and is designed so that the
transformation from flat text, to a message structure via the
\class{Parser} class, and back to flat text, is idempotent (the input
is identical to the output).
Here are the public methods of the \class{Generator} class, imported from the
\module{email.generator} module:
\begin{classdesc}{Generator}{outfp\optional{, mangle_from_\optional{,
maxheaderlen}}}
The constructor for the \class{Generator} class takes a file-like
object called \var{outfp} for an argument. \var{outfp} must support
the \method{write()} method and be usable as the output file in a
Python extended print statement.
Optional \var{mangle_from_} is a flag that, when \code{True}, puts a
\samp{>} character in front of any line in the body that starts exactly as
\samp{From }, i.e. \code{From} followed by a space at the beginning of the
line. This is the only guaranteed portable way to avoid having such
lines be mistaken for a \UNIX{} mailbox format envelope header separator (see
\ulink{WHY THE CONTENT-LENGTH FORMAT IS BAD}
{http://www.jwz.org/doc/content-length.html}
for details). \var{mangle_from_} defaults to \code{True}, but you
might want to set this to \code{False} if you are not writing \UNIX{}
mailbox format files.
Optional \var{maxheaderlen} specifies the longest length for a
non-continued header. When a header line is longer than
\var{maxheaderlen} (in characters, with tabs expanded to 8 spaces),
the header will be split as defined in the \module{email.header.Header}
class. Set to zero to disable header wrapping. The default is 78, as
recommended (but not required) by \rfc{2822}.
\end{classdesc}
The other public \class{Generator} methods are:
\begin{methoddesc}[Generator]{flatten}{msg\optional{, unixfrom}}
Print the textual representation of the message object structure rooted at
\var{msg} to the output file specified when the \class{Generator}
instance was created. Subparts are visited depth-first and the
resulting text will be properly MIME encoded.
Optional \var{unixfrom} is a flag that forces the printing of the
envelope header delimiter before the first \rfc{2822} header of the
root message object. If the root object has no envelope header, a
standard one is crafted. By default, this is set to \code{False} to
inhibit the printing of the envelope delimiter.
Note that for subparts, no envelope header is ever printed.
\versionadded{2.2.2}
\end{methoddesc}
\begin{methoddesc}[Generator]{clone}{fp}
Return an independent clone of this \class{Generator} instance with
the exact same options.
\versionadded{2.2.2}
\end{methoddesc}
\begin{methoddesc}[Generator]{write}{s}
Write the string \var{s} to the underlying file object,
i.e. \var{outfp} passed to \class{Generator}'s constructor. This
provides just enough file-like API for \class{Generator} instances to
be used in extended print statements.
\end{methoddesc}
As a convenience, see the methods \method{Message.as_string()} and
\code{str(aMessage)}, a.k.a. \method{Message.__str__()}, which
simplify the generation of a formatted string representation of a
message object. For more detail, see \refmodule{email.message}.
The \module{email.generator} module also provides a derived class,
called \class{DecodedGenerator} which is like the \class{Generator}
base class, except that non-\mimetype{text} parts are substituted with
a format string representing the part.
\begin{classdesc}{DecodedGenerator}{outfp\optional{, mangle_from_\optional{,
maxheaderlen\optional{, fmt}}}}
This class, derived from \class{Generator} walks through all the
subparts of a message. If the subpart is of main type
\mimetype{text}, then it prints the decoded payload of the subpart.
Optional \var{_mangle_from_} and \var{maxheaderlen} are as with the
\class{Generator} base class.
If the subpart is not of main type \mimetype{text}, optional \var{fmt}
is a format string that is used instead of the message payload.
\var{fmt} is expanded with the following keywords, \samp{\%(keyword)s}
format:
\begin{itemize}
\item \code{type} -- Full MIME type of the non-\mimetype{text} part
\item \code{maintype} -- Main MIME type of the non-\mimetype{text} part
\item \code{subtype} -- Sub-MIME type of the non-\mimetype{text} part
\item \code{filename} -- Filename of the non-\mimetype{text} part
\item \code{description} -- Description associated with the
non-\mimetype{text} part
\item \code{encoding} -- Content transfer encoding of the
non-\mimetype{text} part
\end{itemize}
The default value for \var{fmt} is \code{None}, meaning
\begin{verbatim}
[Non-text (%(type)s) part of message omitted, filename %(filename)s]
\end{verbatim}
\versionadded{2.2.2}
\end{classdesc}
\versionchanged[The previously deprecated method \method{__call__()} was
removed]{2.5}

View File

@ -1,178 +0,0 @@
\declaremodule{standard}{email.header}
\modulesynopsis{Representing non-ASCII headers}
\rfc{2822} is the base standard that describes the format of email
messages. It derives from the older \rfc{822} standard which came
into widespread use at a time when most email was composed of \ASCII{}
characters only. \rfc{2822} is a specification written assuming email
contains only 7-bit \ASCII{} characters.
Of course, as email has been deployed worldwide, it has become
internationalized, such that language specific character sets can now
be used in email messages. The base standard still requires email
messages to be transferred using only 7-bit \ASCII{} characters, so a
slew of RFCs have been written describing how to encode email
containing non-\ASCII{} characters into \rfc{2822}-compliant format.
These RFCs include \rfc{2045}, \rfc{2046}, \rfc{2047}, and \rfc{2231}.
The \module{email} package supports these standards in its
\module{email.header} and \module{email.charset} modules.
If you want to include non-\ASCII{} characters in your email headers,
say in the \mailheader{Subject} or \mailheader{To} fields, you should
use the \class{Header} class and assign the field in the
\class{Message} object to an instance of \class{Header} instead of
using a string for the header value. Import the \class{Header} class from the
\module{email.header} module. For example:
\begin{verbatim}
>>> from email.message import Message
>>> from email.header import Header
>>> msg = Message()
>>> h = Header('p\xf6stal', 'iso-8859-1')
>>> msg['Subject'] = h
>>> print msg.as_string()
Subject: =?iso-8859-1?q?p=F6stal?=
\end{verbatim}
Notice here how we wanted the \mailheader{Subject} field to contain a
non-\ASCII{} character? We did this by creating a \class{Header}
instance and passing in the character set that the byte string was
encoded in. When the subsequent \class{Message} instance was
flattened, the \mailheader{Subject} field was properly \rfc{2047}
encoded. MIME-aware mail readers would show this header using the
embedded ISO-8859-1 character.
\versionadded{2.2.2}
Here is the \class{Header} class description:
\begin{classdesc}{Header}{\optional{s\optional{, charset\optional{,
maxlinelen\optional{, header_name\optional{, continuation_ws\optional{,
errors}}}}}}}
Create a MIME-compliant header that can contain strings in different
character sets.
Optional \var{s} is the initial header value. If \code{None} (the
default), the initial header value is not set. You can later append
to the header with \method{append()} method calls. \var{s} may be a
byte string or a Unicode string, but see the \method{append()}
documentation for semantics.
Optional \var{charset} serves two purposes: it has the same meaning as
the \var{charset} argument to the \method{append()} method. It also
sets the default character set for all subsequent \method{append()}
calls that omit the \var{charset} argument. If \var{charset} is not
provided in the constructor (the default), the \code{us-ascii}
character set is used both as \var{s}'s initial charset and as the
default for subsequent \method{append()} calls.
The maximum line length can be specified explicit via
\var{maxlinelen}. For splitting the first line to a shorter value (to
account for the field header which isn't included in \var{s},
e.g. \mailheader{Subject}) pass in the name of the field in
\var{header_name}. The default \var{maxlinelen} is 76, and the
default value for \var{header_name} is \code{None}, meaning it is not
taken into account for the first line of a long, split header.
Optional \var{continuation_ws} must be \rfc{2822}-compliant folding
whitespace, and is usually either a space or a hard tab character.
This character will be prepended to continuation lines.
\end{classdesc}
Optional \var{errors} is passed straight through to the
\method{append()} method.
\begin{methoddesc}[Header]{append}{s\optional{, charset\optional{, errors}}}
Append the string \var{s} to the MIME header.
Optional \var{charset}, if given, should be a \class{Charset} instance
(see \refmodule{email.charset}) or the name of a character set, which
will be converted to a \class{Charset} instance. A value of
\code{None} (the default) means that the \var{charset} given in the
constructor is used.
\var{s} may be a byte string or a Unicode string. If it is a byte
string (i.e. \code{isinstance(s, str)} is true), then
\var{charset} is the encoding of that byte string, and a
\exception{UnicodeError} will be raised if the string cannot be
decoded with that character set.
If \var{s} is a Unicode string, then \var{charset} is a hint
specifying the character set of the characters in the string. In this
case, when producing an \rfc{2822}-compliant header using \rfc{2047}
rules, the Unicode string will be encoded using the following charsets
in order: \code{us-ascii}, the \var{charset} hint, \code{utf-8}. The
first character set to not provoke a \exception{UnicodeError} is used.
Optional \var{errors} is passed through to any \function{unicode()} or
\function{ustr.encode()} call, and defaults to ``strict''.
\end{methoddesc}
\begin{methoddesc}[Header]{encode}{\optional{splitchars}}
Encode a message header into an RFC-compliant format, possibly
wrapping long lines and encapsulating non-\ASCII{} parts in base64 or
quoted-printable encodings. Optional \var{splitchars} is a string
containing characters to split long ASCII lines on, in rough support
of \rfc{2822}'s \emph{highest level syntactic breaks}. This doesn't
affect \rfc{2047} encoded lines.
\end{methoddesc}
The \class{Header} class also provides a number of methods to support
standard operators and built-in functions.
\begin{methoddesc}[Header]{__str__}{}
A synonym for \method{Header.encode()}. Useful for
\code{str(aHeader)}.
\end{methoddesc}
\begin{methoddesc}[Header]{__unicode__}{}
A helper for the built-in \function{unicode()} function. Returns the
header as a Unicode string.
\end{methoddesc}
\begin{methoddesc}[Header]{__eq__}{other}
This method allows you to compare two \class{Header} instances for equality.
\end{methoddesc}
\begin{methoddesc}[Header]{__ne__}{other}
This method allows you to compare two \class{Header} instances for inequality.
\end{methoddesc}
The \module{email.header} module also provides the following
convenient functions.
\begin{funcdesc}{decode_header}{header}
Decode a message header value without converting the character set.
The header value is in \var{header}.
This function returns a list of \code{(decoded_string, charset)} pairs
containing each of the decoded parts of the header. \var{charset} is
\code{None} for non-encoded parts of the header, otherwise a lower
case string containing the name of the character set specified in the
encoded string.
Here's an example:
\begin{verbatim}
>>> from email.header import decode_header
>>> decode_header('=?iso-8859-1?q?p=F6stal?=')
[('p\xf6stal', 'iso-8859-1')]
\end{verbatim}
\end{funcdesc}
\begin{funcdesc}{make_header}{decoded_seq\optional{, maxlinelen\optional{,
header_name\optional{, continuation_ws}}}}
Create a \class{Header} instance from a sequence of pairs as returned
by \function{decode_header()}.
\function{decode_header()} takes a header value string and returns a
sequence of pairs of the format \code{(decoded_string, charset)} where
\var{charset} is the name of the character set.
This function takes one of those sequence of pairs and returns a
\class{Header} instance. Optional \var{maxlinelen},
\var{header_name}, and \var{continuation_ws} are as in the
\class{Header} constructor.
\end{funcdesc}

View File

@ -1,65 +0,0 @@
\declaremodule{standard}{email.iterators}
\modulesynopsis{Iterate over a message object tree.}
Iterating over a message object tree is fairly easy with the
\method{Message.walk()} method. The \module{email.iterators} module
provides some useful higher level iterations over message object
trees.
\begin{funcdesc}{body_line_iterator}{msg\optional{, decode}}
This iterates over all the payloads in all the subparts of \var{msg},
returning the string payloads line-by-line. It skips over all the
subpart headers, and it skips over any subpart with a payload that
isn't a Python string. This is somewhat equivalent to reading the
flat text representation of the message from a file using
\method{readline()}, skipping over all the intervening headers.
Optional \var{decode} is passed through to \method{Message.get_payload()}.
\end{funcdesc}
\begin{funcdesc}{typed_subpart_iterator}{msg\optional{,
maintype\optional{, subtype}}}
This iterates over all the subparts of \var{msg}, returning only those
subparts that match the MIME type specified by \var{maintype} and
\var{subtype}.
Note that \var{subtype} is optional; if omitted, then subpart MIME
type matching is done only with the main type. \var{maintype} is
optional too; it defaults to \mimetype{text}.
Thus, by default \function{typed_subpart_iterator()} returns each
subpart that has a MIME type of \mimetype{text/*}.
\end{funcdesc}
The following function has been added as a useful debugging tool. It
should \emph{not} be considered part of the supported public interface
for the package.
\begin{funcdesc}{_structure}{msg\optional{, fp\optional{, level}}}
Prints an indented representation of the content types of the
message object structure. For example:
\begin{verbatim}
>>> msg = email.message_from_file(somefile)
>>> _structure(msg)
multipart/mixed
text/plain
text/plain
multipart/digest
message/rfc822
text/plain
message/rfc822
text/plain
message/rfc822
text/plain
message/rfc822
text/plain
message/rfc822
text/plain
text/plain
\end{verbatim}
Optional \var{fp} is a file-like object to print the output to. It
must be suitable for Python's extended print statement. \var{level}
is used internally.
\end{funcdesc}

View File

@ -1,561 +0,0 @@
\declaremodule{standard}{email.message}
\modulesynopsis{The base class representing email messages.}
The central class in the \module{email} package is the
\class{Message} class, imported from the \module{email.message} module. It is
the base class for the \module{email} object model. \class{Message} provides
the core functionality for setting and querying header fields, and for
accessing message bodies.
Conceptually, a \class{Message} object consists of \emph{headers} and
\emph{payloads}. Headers are \rfc{2822} style field names and
values where the field name and value are separated by a colon. The
colon is not part of either the field name or the field value.
Headers are stored and returned in case-preserving form but are
matched case-insensitively. There may also be a single envelope
header, also known as the \emph{Unix-From} header or the
\code{From_} header. The payload is either a string in the case of
simple message objects or a list of \class{Message} objects for
MIME container documents (e.g. \mimetype{multipart/*} and
\mimetype{message/rfc822}).
\class{Message} objects provide a mapping style interface for
accessing the message headers, and an explicit interface for accessing
both the headers and the payload. It provides convenience methods for
generating a flat text representation of the message object tree, for
accessing commonly used header parameters, and for recursively walking
over the object tree.
Here are the methods of the \class{Message} class:
\begin{classdesc}{Message}{}
The constructor takes no arguments.
\end{classdesc}
\begin{methoddesc}[Message]{as_string}{\optional{unixfrom}}
Return the entire message flatten as a string. When optional
\var{unixfrom} is \code{True}, the envelope header is included in the
returned string. \var{unixfrom} defaults to \code{False}.
Note that this method is provided as a convenience and may not always format
the message the way you want. For example, by default it mangles lines that
begin with \code{From }. For more flexibility, instantiate a
\class{Generator} instance and use its
\method{flatten()} method directly. For example:
\begin{verbatim}
from cStringIO import StringIO
from email.generator import Generator
fp = StringIO()
g = Generator(fp, mangle_from_=False, maxheaderlen=60)
g.flatten(msg)
text = fp.getvalue()
\end{verbatim}
\end{methoddesc}
\begin{methoddesc}[Message]{__str__}{}
Equivalent to \method{as_string(unixfrom=True)}.
\end{methoddesc}
\begin{methoddesc}[Message]{is_multipart}{}
Return \code{True} if the message's payload is a list of
sub-\class{Message} objects, otherwise return \code{False}. When
\method{is_multipart()} returns False, the payload should be a string
object.
\end{methoddesc}
\begin{methoddesc}[Message]{set_unixfrom}{unixfrom}
Set the message's envelope header to \var{unixfrom}, which should be a string.
\end{methoddesc}
\begin{methoddesc}[Message]{get_unixfrom}{}
Return the message's envelope header. Defaults to \code{None} if the
envelope header was never set.
\end{methoddesc}
\begin{methoddesc}[Message]{attach}{payload}
Add the given \var{payload} to the current payload, which must be
\code{None} or a list of \class{Message} objects before the call.
After the call, the payload will always be a list of \class{Message}
objects. If you want to set the payload to a scalar object (e.g. a
string), use \method{set_payload()} instead.
\end{methoddesc}
\begin{methoddesc}[Message]{get_payload}{\optional{i\optional{, decode}}}
Return a reference the current payload, which will be a list of
\class{Message} objects when \method{is_multipart()} is \code{True}, or a
string when \method{is_multipart()} is \code{False}. If the
payload is a list and you mutate the list object, you modify the
message's payload in place.
With optional argument \var{i}, \method{get_payload()} will return the
\var{i}-th element of the payload, counting from zero, if
\method{is_multipart()} is \code{True}. An \exception{IndexError}
will be raised if \var{i} is less than 0 or greater than or equal to
the number of items in the payload. If the payload is a string
(i.e. \method{is_multipart()} is \code{False}) and \var{i} is given, a
\exception{TypeError} is raised.
Optional \var{decode} is a flag indicating whether the payload should be
decoded or not, according to the \mailheader{Content-Transfer-Encoding} header.
When \code{True} and the message is not a multipart, the payload will be
decoded if this header's value is \samp{quoted-printable} or
\samp{base64}. If some other encoding is used, or
\mailheader{Content-Transfer-Encoding} header is
missing, or if the payload has bogus base64 data, the payload is
returned as-is (undecoded). If the message is a multipart and the
\var{decode} flag is \code{True}, then \code{None} is returned. The
default for \var{decode} is \code{False}.
\end{methoddesc}
\begin{methoddesc}[Message]{set_payload}{payload\optional{, charset}}
Set the entire message object's payload to \var{payload}. It is the
client's responsibility to ensure the payload invariants. Optional
\var{charset} sets the message's default character set; see
\method{set_charset()} for details.
\versionchanged[\var{charset} argument added]{2.2.2}
\end{methoddesc}
\begin{methoddesc}[Message]{set_charset}{charset}
Set the character set of the payload to \var{charset}, which can
either be a \class{Charset} instance (see \refmodule{email.charset}), a
string naming a character set,
or \code{None}. If it is a string, it will be converted to a
\class{Charset} instance. If \var{charset} is \code{None}, the
\code{charset} parameter will be removed from the
\mailheader{Content-Type} header. Anything else will generate a
\exception{TypeError}.
The message will be assumed to be of type \mimetype{text/*} encoded with
\var{charset.input_charset}. It will be converted to
\var{charset.output_charset}
and encoded properly, if needed, when generating the plain text
representation of the message. MIME headers
(\mailheader{MIME-Version}, \mailheader{Content-Type},
\mailheader{Content-Transfer-Encoding}) will be added as needed.
\versionadded{2.2.2}
\end{methoddesc}
\begin{methoddesc}[Message]{get_charset}{}
Return the \class{Charset} instance associated with the message's payload.
\versionadded{2.2.2}
\end{methoddesc}
The following methods implement a mapping-like interface for accessing
the message's \rfc{2822} headers. Note that there are some
semantic differences between these methods and a normal mapping
(i.e. dictionary) interface. For example, in a dictionary there are
no duplicate keys, but here there may be duplicate message headers. Also,
in dictionaries there is no guaranteed order to the keys returned by
\method{keys()}, but in a \class{Message} object, headers are always
returned in the order they appeared in the original message, or were
added to the message later. Any header deleted and then re-added are
always appended to the end of the header list.
These semantic differences are intentional and are biased toward
maximal convenience.
Note that in all cases, any envelope header present in the message is
not included in the mapping interface.
\begin{methoddesc}[Message]{__len__}{}
Return the total number of headers, including duplicates.
\end{methoddesc}
\begin{methoddesc}[Message]{__contains__}{name}
Return true if the message object has a field named \var{name}.
Matching is done case-insensitively and \var{name} should not include the
trailing colon. Used for the \code{in} operator,
e.g.:
\begin{verbatim}
if 'message-id' in myMessage:
print 'Message-ID:', myMessage['message-id']
\end{verbatim}
\end{methoddesc}
\begin{methoddesc}[Message]{__getitem__}{name}
Return the value of the named header field. \var{name} should not
include the colon field separator. If the header is missing,
\code{None} is returned; a \exception{KeyError} is never raised.
Note that if the named field appears more than once in the message's
headers, exactly which of those field values will be returned is
undefined. Use the \method{get_all()} method to get the values of all
the extant named headers.
\end{methoddesc}
\begin{methoddesc}[Message]{__setitem__}{name, val}
Add a header to the message with field name \var{name} and value
\var{val}. The field is appended to the end of the message's existing
fields.
Note that this does \emph{not} overwrite or delete any existing header
with the same name. If you want to ensure that the new header is the
only one present in the message with field name
\var{name}, delete the field first, e.g.:
\begin{verbatim}
del msg['subject']
msg['subject'] = 'Python roolz!'
\end{verbatim}
\end{methoddesc}
\begin{methoddesc}[Message]{__delitem__}{name}
Delete all occurrences of the field with name \var{name} from the
message's headers. No exception is raised if the named field isn't
present in the headers.
\end{methoddesc}
\begin{methoddesc}[Message]{has_key}{name}
Return true if the message contains a header field named \var{name},
otherwise return false.
\end{methoddesc}
\begin{methoddesc}[Message]{keys}{}
Return a list of all the message's header field names.
\end{methoddesc}
\begin{methoddesc}[Message]{values}{}
Return a list of all the message's field values.
\end{methoddesc}
\begin{methoddesc}[Message]{items}{}
Return a list of 2-tuples containing all the message's field headers
and values.
\end{methoddesc}
\begin{methoddesc}[Message]{get}{name\optional{, failobj}}
Return the value of the named header field. This is identical to
\method{__getitem__()} except that optional \var{failobj} is returned
if the named header is missing (defaults to \code{None}).
\end{methoddesc}
Here are some additional useful methods:
\begin{methoddesc}[Message]{get_all}{name\optional{, failobj}}
Return a list of all the values for the field named \var{name}.
If there are no such named headers in the message, \var{failobj} is
returned (defaults to \code{None}).
\end{methoddesc}
\begin{methoddesc}[Message]{add_header}{_name, _value, **_params}
Extended header setting. This method is similar to
\method{__setitem__()} except that additional header parameters can be
provided as keyword arguments. \var{_name} is the header field to add
and \var{_value} is the \emph{primary} value for the header.
For each item in the keyword argument dictionary \var{_params}, the
key is taken as the parameter name, with underscores converted to
dashes (since dashes are illegal in Python identifiers). Normally,
the parameter will be added as \code{key="value"} unless the value is
\code{None}, in which case only the key will be added.
Here's an example:
\begin{verbatim}
msg.add_header('Content-Disposition', 'attachment', filename='bud.gif')
\end{verbatim}
This will add a header that looks like
\begin{verbatim}
Content-Disposition: attachment; filename="bud.gif"
\end{verbatim}
\end{methoddesc}
\begin{methoddesc}[Message]{replace_header}{_name, _value}
Replace a header. Replace the first header found in the message that
matches \var{_name}, retaining header order and field name case. If
no matching header was found, a \exception{KeyError} is raised.
\versionadded{2.2.2}
\end{methoddesc}
\begin{methoddesc}[Message]{get_content_type}{}
Return the message's content type. The returned string is coerced to
lower case of the form \mimetype{maintype/subtype}. If there was no
\mailheader{Content-Type} header in the message the default type as
given by \method{get_default_type()} will be returned. Since
according to \rfc{2045}, messages always have a default type,
\method{get_content_type()} will always return a value.
\rfc{2045} defines a message's default type to be
\mimetype{text/plain} unless it appears inside a
\mimetype{multipart/digest} container, in which case it would be
\mimetype{message/rfc822}. If the \mailheader{Content-Type} header
has an invalid type specification, \rfc{2045} mandates that the
default type be \mimetype{text/plain}.
\versionadded{2.2.2}
\end{methoddesc}
\begin{methoddesc}[Message]{get_content_maintype}{}
Return the message's main content type. This is the
\mimetype{maintype} part of the string returned by
\method{get_content_type()}.
\versionadded{2.2.2}
\end{methoddesc}
\begin{methoddesc}[Message]{get_content_subtype}{}
Return the message's sub-content type. This is the \mimetype{subtype}
part of the string returned by \method{get_content_type()}.
\versionadded{2.2.2}
\end{methoddesc}
\begin{methoddesc}[Message]{get_default_type}{}
Return the default content type. Most messages have a default content
type of \mimetype{text/plain}, except for messages that are subparts
of \mimetype{multipart/digest} containers. Such subparts have a
default content type of \mimetype{message/rfc822}.
\versionadded{2.2.2}
\end{methoddesc}
\begin{methoddesc}[Message]{set_default_type}{ctype}
Set the default content type. \var{ctype} should either be
\mimetype{text/plain} or \mimetype{message/rfc822}, although this is
not enforced. The default content type is not stored in the
\mailheader{Content-Type} header.
\versionadded{2.2.2}
\end{methoddesc}
\begin{methoddesc}[Message]{get_params}{\optional{failobj\optional{,
header\optional{, unquote}}}}
Return the message's \mailheader{Content-Type} parameters, as a list. The
elements of the returned list are 2-tuples of key/value pairs, as
split on the \character{=} sign. The left hand side of the
\character{=} is the key, while the right hand side is the value. If
there is no \character{=} sign in the parameter the value is the empty
string, otherwise the value is as described in \method{get_param()} and is
unquoted if optional \var{unquote} is \code{True} (the default).
Optional \var{failobj} is the object to return if there is no
\mailheader{Content-Type} header. Optional \var{header} is the header to
search instead of \mailheader{Content-Type}.
\versionchanged[\var{unquote} argument added]{2.2.2}
\end{methoddesc}
\begin{methoddesc}[Message]{get_param}{param\optional{,
failobj\optional{, header\optional{, unquote}}}}
Return the value of the \mailheader{Content-Type} header's parameter
\var{param} as a string. If the message has no \mailheader{Content-Type}
header or if there is no such parameter, then \var{failobj} is
returned (defaults to \code{None}).
Optional \var{header} if given, specifies the message header to use
instead of \mailheader{Content-Type}.
Parameter keys are always compared case insensitively. The return
value can either be a string, or a 3-tuple if the parameter was
\rfc{2231} encoded. When it's a 3-tuple, the elements of the value are of
the form \code{(CHARSET, LANGUAGE, VALUE)}. Note that both \code{CHARSET} and
\code{LANGUAGE} can be \code{None}, in which case you should consider
\code{VALUE} to be encoded in the \code{us-ascii} charset. You can
usually ignore \code{LANGUAGE}.
If your application doesn't care whether the parameter was encoded as in
\rfc{2231}, you can collapse the parameter value by calling
\function{email.Utils.collapse_rfc2231_value()}, passing in the return value
from \method{get_param()}. This will return a suitably decoded Unicode string
whn the value is a tuple, or the original string unquoted if it isn't. For
example:
\begin{verbatim}
rawparam = msg.get_param('foo')
param = email.Utils.collapse_rfc2231_value(rawparam)
\end{verbatim}
In any case, the parameter value (either the returned string, or the
\code{VALUE} item in the 3-tuple) is always unquoted, unless
\var{unquote} is set to \code{False}.
\versionchanged[\var{unquote} argument added, and 3-tuple return value
possible]{2.2.2}
\end{methoddesc}
\begin{methoddesc}[Message]{set_param}{param, value\optional{,
header\optional{, requote\optional{, charset\optional{, language}}}}}
Set a parameter in the \mailheader{Content-Type} header. If the
parameter already exists in the header, its value will be replaced
with \var{value}. If the \mailheader{Content-Type} header as not yet
been defined for this message, it will be set to \mimetype{text/plain}
and the new parameter value will be appended as per \rfc{2045}.
Optional \var{header} specifies an alternative header to
\mailheader{Content-Type}, and all parameters will be quoted as
necessary unless optional \var{requote} is \code{False} (the default
is \code{True}).
If optional \var{charset} is specified, the parameter will be encoded
according to \rfc{2231}. Optional \var{language} specifies the RFC
2231 language, defaulting to the empty string. Both \var{charset} and
\var{language} should be strings.
\versionadded{2.2.2}
\end{methoddesc}
\begin{methoddesc}[Message]{del_param}{param\optional{, header\optional{,
requote}}}
Remove the given parameter completely from the
\mailheader{Content-Type} header. The header will be re-written in
place without the parameter or its value. All values will be quoted
as necessary unless \var{requote} is \code{False} (the default is
\code{True}). Optional \var{header} specifies an alternative to
\mailheader{Content-Type}.
\versionadded{2.2.2}
\end{methoddesc}
\begin{methoddesc}[Message]{set_type}{type\optional{, header}\optional{,
requote}}
Set the main type and subtype for the \mailheader{Content-Type}
header. \var{type} must be a string in the form
\mimetype{maintype/subtype}, otherwise a \exception{ValueError} is
raised.
This method replaces the \mailheader{Content-Type} header, keeping all
the parameters in place. If \var{requote} is \code{False}, this
leaves the existing header's quoting as is, otherwise the parameters
will be quoted (the default).
An alternative header can be specified in the \var{header} argument.
When the \mailheader{Content-Type} header is set a
\mailheader{MIME-Version} header is also added.
\versionadded{2.2.2}
\end{methoddesc}
\begin{methoddesc}[Message]{get_filename}{\optional{failobj}}
Return the value of the \code{filename} parameter of the
\mailheader{Content-Disposition} header of the message. If the header does
not have a \code{filename} parameter, this method falls back to looking for
the \code{name} parameter. If neither is found, or the header is missing,
then \var{failobj} is returned. The returned string will always be unquoted
as per \method{Utils.unquote()}.
\end{methoddesc}
\begin{methoddesc}[Message]{get_boundary}{\optional{failobj}}
Return the value of the \code{boundary} parameter of the
\mailheader{Content-Type} header of the message, or \var{failobj} if either
the header is missing, or has no \code{boundary} parameter. The
returned string will always be unquoted as per
\method{Utils.unquote()}.
\end{methoddesc}
\begin{methoddesc}[Message]{set_boundary}{boundary}
Set the \code{boundary} parameter of the \mailheader{Content-Type}
header to \var{boundary}. \method{set_boundary()} will always quote
\var{boundary} if necessary. A \exception{HeaderParseError} is raised
if the message object has no \mailheader{Content-Type} header.
Note that using this method is subtly different than deleting the old
\mailheader{Content-Type} header and adding a new one with the new boundary
via \method{add_header()}, because \method{set_boundary()} preserves the
order of the \mailheader{Content-Type} header in the list of headers.
However, it does \emph{not} preserve any continuation lines which may
have been present in the original \mailheader{Content-Type} header.
\end{methoddesc}
\begin{methoddesc}[Message]{get_content_charset}{\optional{failobj}}
Return the \code{charset} parameter of the \mailheader{Content-Type}
header, coerced to lower case. If there is no
\mailheader{Content-Type} header, or if that header has no
\code{charset} parameter, \var{failobj} is returned.
Note that this method differs from \method{get_charset()} which
returns the \class{Charset} instance for the default encoding of the
message body.
\versionadded{2.2.2}
\end{methoddesc}
\begin{methoddesc}[Message]{get_charsets}{\optional{failobj}}
Return a list containing the character set names in the message. If
the message is a \mimetype{multipart}, then the list will contain one
element for each subpart in the payload, otherwise, it will be a list
of length 1.
Each item in the list will be a string which is the value of the
\code{charset} parameter in the \mailheader{Content-Type} header for the
represented subpart. However, if the subpart has no
\mailheader{Content-Type} header, no \code{charset} parameter, or is not of
the \mimetype{text} main MIME type, then that item in the returned list
will be \var{failobj}.
\end{methoddesc}
\begin{methoddesc}[Message]{walk}{}
The \method{walk()} method is an all-purpose generator which can be
used to iterate over all the parts and subparts of a message object
tree, in depth-first traversal order. You will typically use
\method{walk()} as the iterator in a \code{for} loop; each
iteration returns the next subpart.
Here's an example that prints the MIME type of every part of a
multipart message structure:
\begin{verbatim}
>>> for part in msg.walk():
... print part.get_content_type()
multipart/report
text/plain
message/delivery-status
text/plain
text/plain
message/rfc822
\end{verbatim}
\end{methoddesc}
\versionchanged[The previously deprecated methods \method{get_type()},
\method{get_main_type()}, and \method{get_subtype()} were removed]{2.5}
\class{Message} objects can also optionally contain two instance
attributes, which can be used when generating the plain text of a MIME
message.
\begin{datadesc}{preamble}
The format of a MIME document allows for some text between the blank
line following the headers, and the first multipart boundary string.
Normally, this text is never visible in a MIME-aware mail reader
because it falls outside the standard MIME armor. However, when
viewing the raw text of the message, or when viewing the message in a
non-MIME aware reader, this text can become visible.
The \var{preamble} attribute contains this leading extra-armor text
for MIME documents. When the \class{Parser} discovers some text after
the headers but before the first boundary string, it assigns this text
to the message's \var{preamble} attribute. When the \class{Generator}
is writing out the plain text representation of a MIME message, and it
finds the message has a \var{preamble} attribute, it will write this
text in the area between the headers and the first boundary. See
\refmodule{email.parser} and \refmodule{email.generator} for details.
Note that if the message object has no preamble, the
\var{preamble} attribute will be \code{None}.
\end{datadesc}
\begin{datadesc}{epilogue}
The \var{epilogue} attribute acts the same way as the \var{preamble}
attribute, except that it contains text that appears between the last
boundary and the end of the message.
\versionchanged[You do not need to set the epilogue to the empty string in
order for the \class{Generator} to print a newline at the end of the
file]{2.5}
\end{datadesc}
\begin{datadesc}{defects}
The \var{defects} attribute contains a list of all the problems found when
parsing this message. See \refmodule{email.errors} for a detailed description
of the possible parsing defects.
\versionadded{2.4}
\end{datadesc}

View File

@ -1,186 +0,0 @@
\declaremodule{standard}{email.mime}
\declaremodule{standard}{email.mime.base}
\declaremodule{standard}{email.mime.nonmultipart}
\declaremodule{standard}{email.mime.multipart}
\declaremodule{standard}{email.mime.audio}
\declaremodule{standard}{email.mime.image}
\declaremodule{standard}{email.mime.message}
\declaremodule{standard}{email.mime.text}
Ordinarily, you get a message object structure by passing a file or
some text to a parser, which parses the text and returns the root
message object. However you can also build a complete message
structure from scratch, or even individual \class{Message} objects by
hand. In fact, you can also take an existing structure and add new
\class{Message} objects, move them around, etc. This makes a very
convenient interface for slicing-and-dicing MIME messages.
You can create a new object structure by creating \class{Message} instances,
adding attachments and all the appropriate headers manually. For MIME
messages though, the \module{email} package provides some convenient
subclasses to make things easier.
Here are the classes:
\begin{classdesc}{MIMEBase}{_maintype, _subtype, **_params}
Module: \module{email.mime.base}
This is the base class for all the MIME-specific subclasses of
\class{Message}. Ordinarily you won't create instances specifically
of \class{MIMEBase}, although you could. \class{MIMEBase} is provided
primarily as a convenient base class for more specific MIME-aware
subclasses.
\var{_maintype} is the \mailheader{Content-Type} major type
(e.g. \mimetype{text} or \mimetype{image}), and \var{_subtype} is the
\mailheader{Content-Type} minor type
(e.g. \mimetype{plain} or \mimetype{gif}). \var{_params} is a parameter
key/value dictionary and is passed directly to
\method{Message.add_header()}.
The \class{MIMEBase} class always adds a \mailheader{Content-Type} header
(based on \var{_maintype}, \var{_subtype}, and \var{_params}), and a
\mailheader{MIME-Version} header (always set to \code{1.0}).
\end{classdesc}
\begin{classdesc}{MIMENonMultipart}{}
Module: \module{email.mime.nonmultipart}
A subclass of \class{MIMEBase}, this is an intermediate base class for
MIME messages that are not \mimetype{multipart}. The primary purpose
of this class is to prevent the use of the \method{attach()} method,
which only makes sense for \mimetype{multipart} messages. If
\method{attach()} is called, a \exception{MultipartConversionError}
exception is raised.
\versionadded{2.2.2}
\end{classdesc}
\begin{classdesc}{MIMEMultipart}{\optional{subtype\optional{,
boundary\optional{, _subparts\optional{, _params}}}}}
Module: \module{email.mime.multipart}
A subclass of \class{MIMEBase}, this is an intermediate base class for
MIME messages that are \mimetype{multipart}. Optional \var{_subtype}
defaults to \mimetype{mixed}, but can be used to specify the subtype
of the message. A \mailheader{Content-Type} header of
\mimetype{multipart/}\var{_subtype} will be added to the message
object. A \mailheader{MIME-Version} header will also be added.
Optional \var{boundary} is the multipart boundary string. When
\code{None} (the default), the boundary is calculated when needed.
\var{_subparts} is a sequence of initial subparts for the payload. It
must be possible to convert this sequence to a list. You can always
attach new subparts to the message by using the
\method{Message.attach()} method.
Additional parameters for the \mailheader{Content-Type} header are
taken from the keyword arguments, or passed into the \var{_params}
argument, which is a keyword dictionary.
\versionadded{2.2.2}
\end{classdesc}
\begin{classdesc}{MIMEApplication}{_data\optional{, _subtype\optional{,
_encoder\optional{, **_params}}}}
Module: \module{email.mime.application}
A subclass of \class{MIMENonMultipart}, the \class{MIMEApplication} class is
used to represent MIME message objects of major type \mimetype{application}.
\var{_data} is a string containing the raw byte data. Optional \var{_subtype}
specifies the MIME subtype and defaults to \mimetype{octet-stream}.
Optional \var{_encoder} is a callable (i.e. function) which will
perform the actual encoding of the data for transport. This
callable takes one argument, which is the \class{MIMEApplication} instance.
It should use \method{get_payload()} and \method{set_payload()} to
change the payload to encoded form. It should also add any
\mailheader{Content-Transfer-Encoding} or other headers to the message
object as necessary. The default encoding is base64. See the
\refmodule{email.encoders} module for a list of the built-in encoders.
\var{_params} are passed straight through to the base class constructor.
\versionadded{2.5}
\end{classdesc}
\begin{classdesc}{MIMEAudio}{_audiodata\optional{, _subtype\optional{,
_encoder\optional{, **_params}}}}
Module: \module{email.mime.audio}
A subclass of \class{MIMENonMultipart}, the \class{MIMEAudio} class
is used to create MIME message objects of major type \mimetype{audio}.
\var{_audiodata} is a string containing the raw audio data. If this
data can be decoded by the standard Python module \refmodule{sndhdr},
then the subtype will be automatically included in the
\mailheader{Content-Type} header. Otherwise you can explicitly specify the
audio subtype via the \var{_subtype} parameter. If the minor type could
not be guessed and \var{_subtype} was not given, then \exception{TypeError}
is raised.
Optional \var{_encoder} is a callable (i.e. function) which will
perform the actual encoding of the audio data for transport. This
callable takes one argument, which is the \class{MIMEAudio} instance.
It should use \method{get_payload()} and \method{set_payload()} to
change the payload to encoded form. It should also add any
\mailheader{Content-Transfer-Encoding} or other headers to the message
object as necessary. The default encoding is base64. See the
\refmodule{email.encoders} module for a list of the built-in encoders.
\var{_params} are passed straight through to the base class constructor.
\end{classdesc}
\begin{classdesc}{MIMEImage}{_imagedata\optional{, _subtype\optional{,
_encoder\optional{, **_params}}}}
Module: \module{email.mime.image}
A subclass of \class{MIMENonMultipart}, the \class{MIMEImage} class is
used to create MIME message objects of major type \mimetype{image}.
\var{_imagedata} is a string containing the raw image data. If this
data can be decoded by the standard Python module \refmodule{imghdr},
then the subtype will be automatically included in the
\mailheader{Content-Type} header. Otherwise you can explicitly specify the
image subtype via the \var{_subtype} parameter. If the minor type could
not be guessed and \var{_subtype} was not given, then \exception{TypeError}
is raised.
Optional \var{_encoder} is a callable (i.e. function) which will
perform the actual encoding of the image data for transport. This
callable takes one argument, which is the \class{MIMEImage} instance.
It should use \method{get_payload()} and \method{set_payload()} to
change the payload to encoded form. It should also add any
\mailheader{Content-Transfer-Encoding} or other headers to the message
object as necessary. The default encoding is base64. See the
\refmodule{email.encoders} module for a list of the built-in encoders.
\var{_params} are passed straight through to the \class{MIMEBase}
constructor.
\end{classdesc}
\begin{classdesc}{MIMEMessage}{_msg\optional{, _subtype}}
Module: \module{email.mime.message}
A subclass of \class{MIMENonMultipart}, the \class{MIMEMessage} class
is used to create MIME objects of main type \mimetype{message}.
\var{_msg} is used as the payload, and must be an instance of class
\class{Message} (or a subclass thereof), otherwise a
\exception{TypeError} is raised.
Optional \var{_subtype} sets the subtype of the message; it defaults
to \mimetype{rfc822}.
\end{classdesc}
\begin{classdesc}{MIMEText}{_text\optional{, _subtype\optional{, _charset}}}
Module: \module{email.mime.text}
A subclass of \class{MIMENonMultipart}, the \class{MIMEText} class is
used to create MIME objects of major type \mimetype{text}.
\var{_text} is the string for the payload. \var{_subtype} is the
minor type and defaults to \mimetype{plain}. \var{_charset} is the
character set of the text and is passed as a parameter to the
\class{MIMENonMultipart} constructor; it defaults to \code{us-ascii}. No
guessing or encoding is performed on the text data.
\versionchanged[The previously deprecated \var{_encoding} argument has
been removed. Encoding happens implicitly based on the \var{_charset}
argument]{2.4}
\end{classdesc}

View File

@ -1,208 +0,0 @@
\declaremodule{standard}{email.parser}
\modulesynopsis{Parse flat text email messages to produce a message
object structure.}
Message object structures can be created in one of two ways: they can be
created from whole cloth by instantiating \class{Message} objects and
stringing them together via \method{attach()} and
\method{set_payload()} calls, or they can be created by parsing a flat text
representation of the email message.
The \module{email} package provides a standard parser that understands
most email document structures, including MIME documents. You can
pass the parser a string or a file object, and the parser will return
to you the root \class{Message} instance of the object structure. For
simple, non-MIME messages the payload of this root object will likely
be a string containing the text of the message. For MIME
messages, the root object will return \code{True} from its
\method{is_multipart()} method, and the subparts can be accessed via
the \method{get_payload()} and \method{walk()} methods.
There are actually two parser interfaces available for use, the classic
\class{Parser} API and the incremental \class{FeedParser} API. The classic
\class{Parser} API is fine if you have the entire text of the message in
memory as a string, or if the entire message lives in a file on the file
system. \class{FeedParser} is more appropriate for when you're reading the
message from a stream which might block waiting for more input (e.g. reading
an email message from a socket). The \class{FeedParser} can consume and parse
the message incrementally, and only returns the root object when you close the
parser\footnote{As of email package version 3.0, introduced in
Python 2.4, the classic \class{Parser} was re-implemented in terms of the
\class{FeedParser}, so the semantics and results are identical between the two
parsers.}.
Note that the parser can be extended in limited ways, and of course
you can implement your own parser completely from scratch. There is
no magical connection between the \module{email} package's bundled
parser and the \class{Message} class, so your custom parser can create
message object trees any way it finds necessary.
\subsubsection{FeedParser API}
\versionadded{2.4}
The \class{FeedParser}, imported from the \module{email.feedparser} module,
provides an API that is conducive to incremental parsing of email messages,
such as would be necessary when reading the text of an email message from a
source that can block (e.g. a socket). The
\class{FeedParser} can of course be used to parse an email message fully
contained in a string or a file, but the classic \class{Parser} API may be
more convenient for such use cases. The semantics and results of the two
parser APIs are identical.
The \class{FeedParser}'s API is simple; you create an instance, feed it a
bunch of text until there's no more to feed it, then close the parser to
retrieve the root message object. The \class{FeedParser} is extremely
accurate when parsing standards-compliant messages, and it does a very good
job of parsing non-compliant messages, providing information about how a
message was deemed broken. It will populate a message object's \var{defects}
attribute with a list of any problems it found in a message. See the
\refmodule{email.errors} module for the list of defects that it can find.
Here is the API for the \class{FeedParser}:
\begin{classdesc}{FeedParser}{\optional{_factory}}
Create a \class{FeedParser} instance. Optional \var{_factory} is a
no-argument callable that will be called whenever a new message object is
needed. It defaults to the \class{email.message.Message} class.
\end{classdesc}
\begin{methoddesc}[FeedParser]{feed}{data}
Feed the \class{FeedParser} some more data. \var{data} should be a
string containing one or more lines. The lines can be partial and the
\class{FeedParser} will stitch such partial lines together properly. The
lines in the string can have any of the common three line endings, carriage
return, newline, or carriage return and newline (they can even be mixed).
\end{methoddesc}
\begin{methoddesc}[FeedParser]{close}{}
Closing a \class{FeedParser} completes the parsing of all previously fed data,
and returns the root message object. It is undefined what happens if you feed
more data to a closed \class{FeedParser}.
\end{methoddesc}
\subsubsection{Parser class API}
The \class{Parser} class, imported from the \module{email.parser} module,
provides an API that can be used to parse a message when the complete contents
of the message are available in a string or file. The
\module{email.parser} module also provides a second class, called
\class{HeaderParser} which can be used if you're only interested in
the headers of the message. \class{HeaderParser} can be much faster in
these situations, since it does not attempt to parse the message body,
instead setting the payload to the raw body as a string.
\class{HeaderParser} has the same API as the \class{Parser} class.
\begin{classdesc}{Parser}{\optional{_class}}
The constructor for the \class{Parser} class takes an optional
argument \var{_class}. This must be a callable factory (such as a
function or a class), and it is used whenever a sub-message object
needs to be created. It defaults to \class{Message} (see
\refmodule{email.message}). The factory will be called without
arguments.
The optional \var{strict} flag is ignored. \deprecated{2.4}{Because the
\class{Parser} class is a backward compatible API wrapper around the
new-in-Python 2.4 \class{FeedParser}, \emph{all} parsing is effectively
non-strict. You should simply stop passing a \var{strict} flag to the
\class{Parser} constructor.}
\versionchanged[The \var{strict} flag was added]{2.2.2}
\versionchanged[The \var{strict} flag was deprecated]{2.4}
\end{classdesc}
The other public \class{Parser} methods are:
\begin{methoddesc}[Parser]{parse}{fp\optional{, headersonly}}
Read all the data from the file-like object \var{fp}, parse the
resulting text, and return the root message object. \var{fp} must
support both the \method{readline()} and the \method{read()} methods
on file-like objects.
The text contained in \var{fp} must be formatted as a block of \rfc{2822}
style headers and header continuation lines, optionally preceded by a
envelope header. The header block is terminated either by the
end of the data or by a blank line. Following the header block is the
body of the message (which may contain MIME-encoded subparts).
Optional \var{headersonly} is as with the \method{parse()} method.
\versionchanged[The \var{headersonly} flag was added]{2.2.2}
\end{methoddesc}
\begin{methoddesc}[Parser]{parsestr}{text\optional{, headersonly}}
Similar to the \method{parse()} method, except it takes a string
object instead of a file-like object. Calling this method on a string
is exactly equivalent to wrapping \var{text} in a \class{StringIO}
instance first and calling \method{parse()}.
Optional \var{headersonly} is a flag specifying whether to stop
parsing after reading the headers or not. The default is \code{False},
meaning it parses the entire contents of the file.
\versionchanged[The \var{headersonly} flag was added]{2.2.2}
\end{methoddesc}
Since creating a message object structure from a string or a file
object is such a common task, two functions are provided as a
convenience. They are available in the top-level \module{email}
package namespace.
\begin{funcdesc}{message_from_string}{s\optional{, _class\optional{, strict}}}
Return a message object structure from a string. This is exactly
equivalent to \code{Parser().parsestr(s)}. Optional \var{_class} and
\var{strict} are interpreted as with the \class{Parser} class constructor.
\versionchanged[The \var{strict} flag was added]{2.2.2}
\end{funcdesc}
\begin{funcdesc}{message_from_file}{fp\optional{, _class\optional{, strict}}}
Return a message object structure tree from an open file object. This
is exactly equivalent to \code{Parser().parse(fp)}. Optional
\var{_class} and \var{strict} are interpreted as with the
\class{Parser} class constructor.
\versionchanged[The \var{strict} flag was added]{2.2.2}
\end{funcdesc}
Here's an example of how you might use this at an interactive Python
prompt:
\begin{verbatim}
>>> import email
>>> msg = email.message_from_string(myString)
\end{verbatim}
\subsubsection{Additional notes}
Here are some notes on the parsing semantics:
\begin{itemize}
\item Most non-\mimetype{multipart} type messages are parsed as a single
message object with a string payload. These objects will return
\code{False} for \method{is_multipart()}. Their
\method{get_payload()} method will return a string object.
\item All \mimetype{multipart} type messages will be parsed as a
container message object with a list of sub-message objects for
their payload. The outer container message will return
\code{True} for \method{is_multipart()} and their
\method{get_payload()} method will return the list of
\class{Message} subparts.
\item Most messages with a content type of \mimetype{message/*}
(e.g. \mimetype{message/delivery-status} and
\mimetype{message/rfc822}) will also be parsed as container
object containing a list payload of length 1. Their
\method{is_multipart()} method will return \code{True}. The
single element in the list payload will be a sub-message object.
\item Some non-standards compliant messages may not be internally consistent
about their \mimetype{multipart}-edness. Such messages may have a
\mailheader{Content-Type} header of type \mimetype{multipart}, but their
\method{is_multipart()} method may return \code{False}. If such
messages were parsed with the \class{FeedParser}, they will have an
instance of the \class{MultipartInvariantViolationDefect} class in their
\var{defects} attribute list. See \refmodule{email.errors} for
details.
\end{itemize}

View File

@ -1,157 +0,0 @@
\declaremodule{standard}{email.utils}
\modulesynopsis{Miscellaneous email package utilities.}
There are several useful utilities provided in the \module{email.utils}
module:
\begin{funcdesc}{quote}{str}
Return a new string with backslashes in \var{str} replaced by two
backslashes, and double quotes replaced by backslash-double quote.
\end{funcdesc}
\begin{funcdesc}{unquote}{str}
Return a new string which is an \emph{unquoted} version of \var{str}.
If \var{str} ends and begins with double quotes, they are stripped
off. Likewise if \var{str} ends and begins with angle brackets, they
are stripped off.
\end{funcdesc}
\begin{funcdesc}{parseaddr}{address}
Parse address -- which should be the value of some address-containing
field such as \mailheader{To} or \mailheader{Cc} -- into its constituent
\emph{realname} and \emph{email address} parts. Returns a tuple of that
information, unless the parse fails, in which case a 2-tuple of
\code{('', '')} is returned.
\end{funcdesc}
\begin{funcdesc}{formataddr}{pair}
The inverse of \method{parseaddr()}, this takes a 2-tuple of the form
\code{(realname, email_address)} and returns the string value suitable
for a \mailheader{To} or \mailheader{Cc} header. If the first element of
\var{pair} is false, then the second element is returned unmodified.
\end{funcdesc}
\begin{funcdesc}{getaddresses}{fieldvalues}
This method returns a list of 2-tuples of the form returned by
\code{parseaddr()}. \var{fieldvalues} is a sequence of header field
values as might be returned by \method{Message.get_all()}. Here's a
simple example that gets all the recipients of a message:
\begin{verbatim}
from email.utils import getaddresses
tos = msg.get_all('to', [])
ccs = msg.get_all('cc', [])
resent_tos = msg.get_all('resent-to', [])
resent_ccs = msg.get_all('resent-cc', [])
all_recipients = getaddresses(tos + ccs + resent_tos + resent_ccs)
\end{verbatim}
\end{funcdesc}
\begin{funcdesc}{parsedate}{date}
Attempts to parse a date according to the rules in \rfc{2822}.
however, some mailers don't follow that format as specified, so
\function{parsedate()} tries to guess correctly in such cases.
\var{date} is a string containing an \rfc{2822} date, such as
\code{"Mon, 20 Nov 1995 19:12:08 -0500"}. If it succeeds in parsing
the date, \function{parsedate()} returns a 9-tuple that can be passed
directly to \function{time.mktime()}; otherwise \code{None} will be
returned. Note that indexes 6, 7, and 8 of the result tuple are not
usable.
\end{funcdesc}
\begin{funcdesc}{parsedate_tz}{date}
Performs the same function as \function{parsedate()}, but returns
either \code{None} or a 10-tuple; the first 9 elements make up a tuple
that can be passed directly to \function{time.mktime()}, and the tenth
is the offset of the date's timezone from UTC (which is the official
term for Greenwich Mean Time)\footnote{Note that the sign of the timezone
offset is the opposite of the sign of the \code{time.timezone}
variable for the same timezone; the latter variable follows the
\POSIX{} standard while this module follows \rfc{2822}.}. If the input
string has no timezone, the last element of the tuple returned is
\code{None}. Note that indexes 6, 7, and 8 of the result tuple are not
usable.
\end{funcdesc}
\begin{funcdesc}{mktime_tz}{tuple}
Turn a 10-tuple as returned by \function{parsedate_tz()} into a UTC
timestamp. It the timezone item in the tuple is \code{None}, assume
local time. Minor deficiency: \function{mktime_tz()} interprets the
first 8 elements of \var{tuple} as a local time and then compensates
for the timezone difference. This may yield a slight error around
changes in daylight savings time, though not worth worrying about for
common use.
\end{funcdesc}
\begin{funcdesc}{formatdate}{\optional{timeval\optional{, localtime}\optional{, usegmt}}}
Returns a date string as per \rfc{2822}, e.g.:
\begin{verbatim}
Fri, 09 Nov 2001 01:08:47 -0000
\end{verbatim}
Optional \var{timeval} if given is a floating point time value as
accepted by \function{time.gmtime()} and \function{time.localtime()},
otherwise the current time is used.
Optional \var{localtime} is a flag that when \code{True}, interprets
\var{timeval}, and returns a date relative to the local timezone
instead of UTC, properly taking daylight savings time into account.
The default is \code{False} meaning UTC is used.
Optional \var{usegmt} is a flag that when \code{True}, outputs a
date string with the timezone as an ascii string \code{GMT}, rather
than a numeric \code{-0000}. This is needed for some protocols (such
as HTTP). This only applies when \var{localtime} is \code{False}.
\versionadded{2.4}
\end{funcdesc}
\begin{funcdesc}{make_msgid}{\optional{idstring}}
Returns a string suitable for an \rfc{2822}-compliant
\mailheader{Message-ID} header. Optional \var{idstring} if given, is
a string used to strengthen the uniqueness of the message id.
\end{funcdesc}
\begin{funcdesc}{decode_rfc2231}{s}
Decode the string \var{s} according to \rfc{2231}.
\end{funcdesc}
\begin{funcdesc}{encode_rfc2231}{s\optional{, charset\optional{, language}}}
Encode the string \var{s} according to \rfc{2231}. Optional
\var{charset} and \var{language}, if given is the character set name
and language name to use. If neither is given, \var{s} is returned
as-is. If \var{charset} is given but \var{language} is not, the
string is encoded using the empty string for \var{language}.
\end{funcdesc}
\begin{funcdesc}{collapse_rfc2231_value}{value\optional{, errors\optional{,
fallback_charset}}}
When a header parameter is encoded in \rfc{2231} format,
\method{Message.get_param()} may return a 3-tuple containing the character
set, language, and value. \function{collapse_rfc2231_value()} turns this into
a unicode string. Optional \var{errors} is passed to the \var{errors}
argument of the built-in \function{unicode()} function; it defaults to
\code{replace}. Optional \var{fallback_charset} specifies the character set
to use if the one in the \rfc{2231} header is not known by Python; it defaults
to \code{us-ascii}.
For convenience, if the \var{value} passed to
\function{collapse_rfc2231_value()} is not a tuple, it should be a string and
it is returned unquoted.
\end{funcdesc}
\begin{funcdesc}{decode_params}{params}
Decode parameters list according to \rfc{2231}. \var{params} is a
sequence of 2-tuples containing elements of the form
\code{(content-type, string-value)}.
\end{funcdesc}
\versionchanged[The \function{dump_address_pair()} function has been removed;
use \function{formataddr()} instead]{2.4}
\versionchanged[The \function{decode()} function has been removed; use the
\method{Header.decode_header()} method instead]{2.4}
\versionchanged[The \function{encode()} function has been removed; use the
\method{Header.encode()} method instead]{2.4}

View File

@ -1,7 +0,0 @@
\chapter{File Formats}
\label{fileformats}
The modules described in this chapter parse various miscellaneous file
formats that aren't markup languages or are related to e-mail.
\localmoduletable

View File

@ -1,18 +0,0 @@
\chapter{File and Directory Access}
\label{filesys}
The modules described in this chapter deal with disk files and
directories. For example, there are modules for reading the
properties of files, manipulating paths in a portable way, and
creating temporary files. The full list of modules in this chapter is:
\localmoduletable
% XXX can this be included in the seealso environment? --amk
Also see section \ref{bltin-file-objects} for a description
of Python's built-in file objects.
\begin{seealso}
\seemodule{os}{Operating system interfaces, including functions to
work with files at a lower level than the built-in file object.}
\end{seealso}

View File

@ -1,10 +0,0 @@
\chapter{Program Frameworks}
\label{frameworks}
The modules described in this chapter are frameworks that will largely
dictate the structure of your program. Currently the modules described
here are all oriented toward writing command-line interfaces.
The full list of modules described in this chapter is:
\localmoduletable

Some files were not shown because too many files have changed in this diff Show More