Added new intro sections (incomplete); fixed various typos

This commit is contained in:
Guido van Rossum 1997-08-14 20:34:33 +00:00
parent 91c7c933cc
commit 59a61352ad
2 changed files with 374 additions and 14 deletions

View File

@ -40,6 +40,186 @@ API functions in detail.
\chapter{Introduction}
The Application Programmer's Interface to Python gives C and C++
programmers access to the Python interpreter at a variety of levels.
There are two fundamentally different reasons for using the Python/C
API. (The API is equally usable from C++, but for brevity it is
generally referred to as the Python/C API.) The first reason is to
write ``extension modules'' for specific purposes; these are C modules
that extend the Python interpreter. This is probably the most common
use. The second reason is to use Python as a component in a larger
application; this technique is generally referred to as ``embedding''
Python in an application.
Writing an extension module is a relatively well-understood process,
where a ``cookbook'' approach works well. There are several tools
that automate the process to some extent. While people have embedded
Python in other applications since its early existence, the process of
embedding Python is less straightforward that writing an extension.
Python 1.5 introduces a number of new API functions as well as some
changes to the build process that make embedding much simpler.
This manual describes the 1.5 state of affair (as of Python 1.5a3).
% XXX Eventually, take the historical notes out
Many API functions are useful independent of whether you're embedding
or extending Python; moreover, most applications that embed Python
will need to provide a custom extension as well, so it's probably a
good idea to become familiar with writing an extension before
attempting to embed Python in a real application.
\section{Objects, Types and Reference Counts}
Most Python/C API functions have one or more arguments as well as a
return value of type \code{PyObject *}. This type is a pointer
(obviously!) to an opaque data type representing an arbitrary Python
object. Since all Python object types are treated the same way by the
Python language in most situations (e.g., assignments, scope rules,
and argument passing), it is only fitting that they should be
represented by a single C type. All Python objects live on the heap:
you never declare an automatic or static variable of type
\code{PyObject}, only pointer variables of type \code{PyObject *} can
be declared.
All Python objects (even Python integers) have a ``type'' and a
``reference count''. An object's type determines what kind of object
it is (e.g., an integer, a list, or a user-defined function; there are
many more as explained in the Python Language Reference Manual). For
each of the well-known types there is a macro to check whether an
object is of that type; for instance, \code{PyList_Check(a)} is true
iff the object pointed to by \code{a} is a Python list.
The reference count is important only because today's computers have a
finite (and often severly limited) memory size; it counts how many
different places there are that have a reference to an object. Such a
place could be another object, or a global (or static) C variable, or
a local variable in some C function. When an object's reference count
becomes zero, the object is deallocated. If it contains references to
other objects, their reference count is decremented. Those other
objects may be deallocated in turn, if this decrement makes their
reference count become zero, and so on. (There's an obvious problem
with objects that reference each other here; for now, the solution is
``don't do that''.)
Reference counts are always manipulated explicitly. The normal way is
to use the macro \code{Py_INCREF(a)} to increment an object's
reference count by one, and \code{Py_DECREF(a)} to decrement it by
one. The latter macro is considerably more complex than the former,
since it must check whether the reference count becomes zero and then
cause the object's deallocator, which is a function pointer contained
in the object's type structure. The type-specific deallocator takes
care of decrementing the reference counts for other objects contained
in the object, and so on, if this is a compound object type such as a
list. There's no chance that the reference count can overflow; at
least as many bits are used to hold the reference count as there are
distinct memory locations in virtual memory (assuming
\code{sizeof(long) >= sizeof(char *)}). Thus, the reference count
increment is a simple operation.
It is not necessary to increment an object's reference count for every
local variable that contains a pointer to an object. In theory, the
oject's reference count goes up by one when the variable is made to
point to it and it goes down by one when the variable goes out of
scope. However, these two cancel each other out, so at the end the
reference count hasn't changed. The only real reason to use the
reference count is to prevent the object from being deallocated as
long as our variable is pointing to it. If we know that there is at
least one other reference to the object that lives at least as long as
our variable, there is no need to increment the reference count
temporarily. An important situation where this arises is in objects
that are passed as arguments to C functions in an extension module
that are called from Python; the call mechanism guarantees to hold a
reference to every argument for the duration of the call.
However, a common pitfall is to extract an object from a list and
holding on to it for a while without incrementing its reference count.
Some other operation might conceivably remove the object from the
list, decrementing its reference count and possible deallocating it.
The real danger is that innocent-looking operations may invoke
arbitrary Python code which could do this; there is a code path which
allows control to flow back to the user from a \code{Py_DECREF()}, so
almost any operation is potentially dangerous.
A safe approach is to always use the generic operations (functions
whose name begins with \code{PyObject_}, \code{PyNumber_},
\code{PySequence_} or \code{PyMapping_}). These operations always
increment the reference count of the object they return. This leaves
the caller with the responsibility to call \code{Py_DECREF()} when
they are done with the result; this soon becomes second nature.
There are very few other data types that play a significant role in
the Python/C API; most are all simple C types such as \code{int},
\code{long}, \code{double} and \code{char *}. A few structure types
are used to describe static tables used to list the functions exported
by a module or the data attributes of a new object type. These will
be discussed together with the functions that use them.
\section{Exceptions}
The Python programmer only needs to deal with exceptions if specific
error handling is required; unhandled exceptions are automatically
propagated to the caller, then to the caller's caller, and so on, till
they reach the top-level interpreter, where they are reported to the
user accompanied by a stack trace.
For C programmers, however, error checking always has to be explicit.
% XXX add more stuff here
\section{Embedding Python}
The one important task that only embedders of the Python interpreter
have to worry about is the initialization (and possibly the
finalization) of the Python interpreter. Most functionality of the
interpreter can only be used after the interpreter has been
initialized.
The basic initialization function is \code{Py_Initialize()}. This
initializes the table of loaded modules, and creates the fundamental
modules \code{__builtin__}, \code{__main__} and \code{sys}. It also
initializes the module search path (\code{sys.path}).
\code{Py_Initialize()} does not set the ``script argument list''
(\code{sys.argv}). If this variable is needed by Python code that
will be executed later, it must be set explicitly with a call to
\code{PySys_SetArgv(\var{argc}, \var{argv})} subsequent to the call
to \code{Py_Initialize()}.
On Unix, \code{Py_Initialize()} calculates the module search path
based upon its best guess for the location of the standard Python
interpreter executable, assuming that the Python library is found in a
fixed location relative to the Python interpreter executable. In
particular, it looks for a directory named \code{lib/python1.5}
(replacing \code{1.5} with the current interpreter version) relative
to the parent directory where the executable named \code{python} is
found on the shell command search path (the environment variable
\code{$PATH}). For instance, if the Python executable is found in
\code{/usr/local/bin/python}, it will assume that the libraries are in
\code{/usr/local/lib/python1.5}. In fact, this also the ``fallback''
location, used when no executable file named \code{python} is found
along \code{\$PATH}. The user can change this behavior by setting the
environment variable \code{\$PYTHONHOME}, and can insert additional
directories in front of the standard path by setting
\code{\$PYTHONPATH}.
The embedding application can steer the search by calling
\code{Py_SetProgramName(\var{file})} \emph{before} calling
\code{Py_Initialize()}. Note that \code[$PYTHONHOME} still overrides
this and \code{\$PYTHONPATH} is still inserted in front of the
standard path.
Sometimes, it is desirable to ``uninitialize'' Python. For instance,
the application may want to start over (make another call to
\code{Py_Initialize()}) or the application is simply done with its
use of Python and wants to free all memory allocated by Python. This
can be accomplished by calling \code{Py_Finalize()}.
% XXX More...
\section{Embedding Python in Threaded Applications}
%XXX more here
\chapter{Old Introduction}
(XXX This is the old introduction, mostly by Jim Fulton -- should be
rewritten.)
@ -56,7 +236,7 @@ enough to write a simple application that gets Python code from the
user, execs it, and returns the output or errors.
\item "Abstract objects layer": which is the subject of this chapter.
It has many functions operating on objects, and lest you do many
It has many functions operating on objects, and lets you do many
things from C that you can also write in Python, without going through
the Python parser.
@ -495,7 +675,7 @@ This function always succeeds.
\end{cfuncdesc}
\begin{cfuncdesc}{PyObject*}{PyObject_GetAttrString}{PyObject *o, char *attr_name}
Retrieve an attributed named attr_name form object o.
Retrieve an attributed named attr_name from object o.
Returns the attribute value on success, or \NULL{} on failure.
This is the equivalent of the Python expression: \code{o.attr_name}.
\end{cfuncdesc}
@ -664,7 +844,7 @@ of the Python statement: \code{o[key]=v}.
\begin{cfuncdesc}{int}{PyObject_DelItem}{PyObject *o, PyObject *key, PyObject *v}
Delete the mapping for \code{key} from \code{*o}. Returns -1
on failure.
This is the equivalent of the Python statement: del o[key].
This is the equivalent of the Python statement: \code{del o[key]}.
\end{cfuncdesc}
@ -745,7 +925,7 @@ the equivalent of the Python expression: \code{abs(o)}.
\begin{cfuncdesc}{PyObject*}{PyNumber_Invert}{PyObject *o}
Returns the bitwise negation of \code{o} on success, or \NULL{} on
failure. This is the equivalent of the Python expression:
\code{~o}.
\code{\~o}.
\end{cfuncdesc}
@ -777,7 +957,7 @@ expression: \code{o1\^{ }o2}.
\end{cfuncdesc}
\begin{cfuncdesc}{PyObject*}{PyNumber_Or}{PyObject *o1, PyObject *o2}
Returns the result or \code{o1} and \code{o2} on success, or \NULL{} on
Returns the result of \code{o1} and \code{o2} on success, or \NULL{} on
failure. This is the equivalent of the Python expression:
\code{o1 or o2}.
\end{cfuncdesc}
@ -837,7 +1017,7 @@ expression: \code{o1+o2}.
\begin{cfuncdesc}{PyObject*}{PySequence_Repeat}{PyObject *o, int count}
Return the result of repeating sequence object \code{o} count times,
Return the result of repeating sequence object \code{o} \code{count} times,
or \NULL{} on failure. This is the equivalent of the Python
expression: \code{o*count}.
\end{cfuncdesc}
@ -899,7 +1079,7 @@ is equivalent to the Python expression: \code{value in o}.
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PySequence_Index}{PyObject *o, PyObject *value}
Return the first index for which \code{o[i]=value}. On error,
Return the first index for which \code{o[i]==value}. On error,
return -1. This is equivalent to the Python
expression: \code{o.index(value)}.
\end{cfuncdesc}

View File

@ -40,6 +40,186 @@ API functions in detail.
\chapter{Introduction}
The Application Programmer's Interface to Python gives C and C++
programmers access to the Python interpreter at a variety of levels.
There are two fundamentally different reasons for using the Python/C
API. (The API is equally usable from C++, but for brevity it is
generally referred to as the Python/C API.) The first reason is to
write ``extension modules'' for specific purposes; these are C modules
that extend the Python interpreter. This is probably the most common
use. The second reason is to use Python as a component in a larger
application; this technique is generally referred to as ``embedding''
Python in an application.
Writing an extension module is a relatively well-understood process,
where a ``cookbook'' approach works well. There are several tools
that automate the process to some extent. While people have embedded
Python in other applications since its early existence, the process of
embedding Python is less straightforward that writing an extension.
Python 1.5 introduces a number of new API functions as well as some
changes to the build process that make embedding much simpler.
This manual describes the 1.5 state of affair (as of Python 1.5a3).
% XXX Eventually, take the historical notes out
Many API functions are useful independent of whether you're embedding
or extending Python; moreover, most applications that embed Python
will need to provide a custom extension as well, so it's probably a
good idea to become familiar with writing an extension before
attempting to embed Python in a real application.
\section{Objects, Types and Reference Counts}
Most Python/C API functions have one or more arguments as well as a
return value of type \code{PyObject *}. This type is a pointer
(obviously!) to an opaque data type representing an arbitrary Python
object. Since all Python object types are treated the same way by the
Python language in most situations (e.g., assignments, scope rules,
and argument passing), it is only fitting that they should be
represented by a single C type. All Python objects live on the heap:
you never declare an automatic or static variable of type
\code{PyObject}, only pointer variables of type \code{PyObject *} can
be declared.
All Python objects (even Python integers) have a ``type'' and a
``reference count''. An object's type determines what kind of object
it is (e.g., an integer, a list, or a user-defined function; there are
many more as explained in the Python Language Reference Manual). For
each of the well-known types there is a macro to check whether an
object is of that type; for instance, \code{PyList_Check(a)} is true
iff the object pointed to by \code{a} is a Python list.
The reference count is important only because today's computers have a
finite (and often severly limited) memory size; it counts how many
different places there are that have a reference to an object. Such a
place could be another object, or a global (or static) C variable, or
a local variable in some C function. When an object's reference count
becomes zero, the object is deallocated. If it contains references to
other objects, their reference count is decremented. Those other
objects may be deallocated in turn, if this decrement makes their
reference count become zero, and so on. (There's an obvious problem
with objects that reference each other here; for now, the solution is
``don't do that''.)
Reference counts are always manipulated explicitly. The normal way is
to use the macro \code{Py_INCREF(a)} to increment an object's
reference count by one, and \code{Py_DECREF(a)} to decrement it by
one. The latter macro is considerably more complex than the former,
since it must check whether the reference count becomes zero and then
cause the object's deallocator, which is a function pointer contained
in the object's type structure. The type-specific deallocator takes
care of decrementing the reference counts for other objects contained
in the object, and so on, if this is a compound object type such as a
list. There's no chance that the reference count can overflow; at
least as many bits are used to hold the reference count as there are
distinct memory locations in virtual memory (assuming
\code{sizeof(long) >= sizeof(char *)}). Thus, the reference count
increment is a simple operation.
It is not necessary to increment an object's reference count for every
local variable that contains a pointer to an object. In theory, the
oject's reference count goes up by one when the variable is made to
point to it and it goes down by one when the variable goes out of
scope. However, these two cancel each other out, so at the end the
reference count hasn't changed. The only real reason to use the
reference count is to prevent the object from being deallocated as
long as our variable is pointing to it. If we know that there is at
least one other reference to the object that lives at least as long as
our variable, there is no need to increment the reference count
temporarily. An important situation where this arises is in objects
that are passed as arguments to C functions in an extension module
that are called from Python; the call mechanism guarantees to hold a
reference to every argument for the duration of the call.
However, a common pitfall is to extract an object from a list and
holding on to it for a while without incrementing its reference count.
Some other operation might conceivably remove the object from the
list, decrementing its reference count and possible deallocating it.
The real danger is that innocent-looking operations may invoke
arbitrary Python code which could do this; there is a code path which
allows control to flow back to the user from a \code{Py_DECREF()}, so
almost any operation is potentially dangerous.
A safe approach is to always use the generic operations (functions
whose name begins with \code{PyObject_}, \code{PyNumber_},
\code{PySequence_} or \code{PyMapping_}). These operations always
increment the reference count of the object they return. This leaves
the caller with the responsibility to call \code{Py_DECREF()} when
they are done with the result; this soon becomes second nature.
There are very few other data types that play a significant role in
the Python/C API; most are all simple C types such as \code{int},
\code{long}, \code{double} and \code{char *}. A few structure types
are used to describe static tables used to list the functions exported
by a module or the data attributes of a new object type. These will
be discussed together with the functions that use them.
\section{Exceptions}
The Python programmer only needs to deal with exceptions if specific
error handling is required; unhandled exceptions are automatically
propagated to the caller, then to the caller's caller, and so on, till
they reach the top-level interpreter, where they are reported to the
user accompanied by a stack trace.
For C programmers, however, error checking always has to be explicit.
% XXX add more stuff here
\section{Embedding Python}
The one important task that only embedders of the Python interpreter
have to worry about is the initialization (and possibly the
finalization) of the Python interpreter. Most functionality of the
interpreter can only be used after the interpreter has been
initialized.
The basic initialization function is \code{Py_Initialize()}. This
initializes the table of loaded modules, and creates the fundamental
modules \code{__builtin__}, \code{__main__} and \code{sys}. It also
initializes the module search path (\code{sys.path}).
\code{Py_Initialize()} does not set the ``script argument list''
(\code{sys.argv}). If this variable is needed by Python code that
will be executed later, it must be set explicitly with a call to
\code{PySys_SetArgv(\var{argc}, \var{argv})} subsequent to the call
to \code{Py_Initialize()}.
On Unix, \code{Py_Initialize()} calculates the module search path
based upon its best guess for the location of the standard Python
interpreter executable, assuming that the Python library is found in a
fixed location relative to the Python interpreter executable. In
particular, it looks for a directory named \code{lib/python1.5}
(replacing \code{1.5} with the current interpreter version) relative
to the parent directory where the executable named \code{python} is
found on the shell command search path (the environment variable
\code{$PATH}). For instance, if the Python executable is found in
\code{/usr/local/bin/python}, it will assume that the libraries are in
\code{/usr/local/lib/python1.5}. In fact, this also the ``fallback''
location, used when no executable file named \code{python} is found
along \code{\$PATH}. The user can change this behavior by setting the
environment variable \code{\$PYTHONHOME}, and can insert additional
directories in front of the standard path by setting
\code{\$PYTHONPATH}.
The embedding application can steer the search by calling
\code{Py_SetProgramName(\var{file})} \emph{before} calling
\code{Py_Initialize()}. Note that \code[$PYTHONHOME} still overrides
this and \code{\$PYTHONPATH} is still inserted in front of the
standard path.
Sometimes, it is desirable to ``uninitialize'' Python. For instance,
the application may want to start over (make another call to
\code{Py_Initialize()}) or the application is simply done with its
use of Python and wants to free all memory allocated by Python. This
can be accomplished by calling \code{Py_Finalize()}.
% XXX More...
\section{Embedding Python in Threaded Applications}
%XXX more here
\chapter{Old Introduction}
(XXX This is the old introduction, mostly by Jim Fulton -- should be
rewritten.)
@ -56,7 +236,7 @@ enough to write a simple application that gets Python code from the
user, execs it, and returns the output or errors.
\item "Abstract objects layer": which is the subject of this chapter.
It has many functions operating on objects, and lest you do many
It has many functions operating on objects, and lets you do many
things from C that you can also write in Python, without going through
the Python parser.
@ -495,7 +675,7 @@ This function always succeeds.
\end{cfuncdesc}
\begin{cfuncdesc}{PyObject*}{PyObject_GetAttrString}{PyObject *o, char *attr_name}
Retrieve an attributed named attr_name form object o.
Retrieve an attributed named attr_name from object o.
Returns the attribute value on success, or \NULL{} on failure.
This is the equivalent of the Python expression: \code{o.attr_name}.
\end{cfuncdesc}
@ -664,7 +844,7 @@ of the Python statement: \code{o[key]=v}.
\begin{cfuncdesc}{int}{PyObject_DelItem}{PyObject *o, PyObject *key, PyObject *v}
Delete the mapping for \code{key} from \code{*o}. Returns -1
on failure.
This is the equivalent of the Python statement: del o[key].
This is the equivalent of the Python statement: \code{del o[key]}.
\end{cfuncdesc}
@ -745,7 +925,7 @@ the equivalent of the Python expression: \code{abs(o)}.
\begin{cfuncdesc}{PyObject*}{PyNumber_Invert}{PyObject *o}
Returns the bitwise negation of \code{o} on success, or \NULL{} on
failure. This is the equivalent of the Python expression:
\code{~o}.
\code{\~o}.
\end{cfuncdesc}
@ -777,7 +957,7 @@ expression: \code{o1\^{ }o2}.
\end{cfuncdesc}
\begin{cfuncdesc}{PyObject*}{PyNumber_Or}{PyObject *o1, PyObject *o2}
Returns the result or \code{o1} and \code{o2} on success, or \NULL{} on
Returns the result of \code{o1} and \code{o2} on success, or \NULL{} on
failure. This is the equivalent of the Python expression:
\code{o1 or o2}.
\end{cfuncdesc}
@ -837,7 +1017,7 @@ expression: \code{o1+o2}.
\begin{cfuncdesc}{PyObject*}{PySequence_Repeat}{PyObject *o, int count}
Return the result of repeating sequence object \code{o} count times,
Return the result of repeating sequence object \code{o} \code{count} times,
or \NULL{} on failure. This is the equivalent of the Python
expression: \code{o*count}.
\end{cfuncdesc}
@ -899,7 +1079,7 @@ is equivalent to the Python expression: \code{value in o}.
\end{cfuncdesc}
\begin{cfuncdesc}{int}{PySequence_Index}{PyObject *o, PyObject *value}
Return the first index for which \code{o[i]=value}. On error,
Return the first index for which \code{o[i]==value}. On error,
return -1. This is equivalent to the Python
expression: \code{o.index(value)}.
\end{cfuncdesc}