Added new intro sections (incomplete); fixed various typos

1997-08-14 20:34:33 +00:00 · 1997-08-14 20:34:33 +00:00 · 59a61352ad
parent 91c7c933cc
commit 59a61352ad
2 changed files with 374 additions and 14 deletions
--- a/Doc/api.tex
+++ b/Doc/api.tex
@ -40,6 +40,186 @@ API functions in detail.

 \chapter{Introduction}

+The Application Programmer's Interface to Python gives C and C++
+programmers access to the Python interpreter at a variety of levels.
+There are two fundamentally different reasons for using the Python/C
+API.  (The API is equally usable from C++, but for brevity it is
+generally referred to as the Python/C API.)  The first reason is to
+write ``extension modules'' for specific purposes; these are C modules
+that extend the Python interpreter.  This is probably the most common
+use.  The second reason is to use Python as a component in a larger
+application; this technique is generally referred to as ``embedding''
+Python in an application.
+
+Writing an extension module is a relatively well-understood process,
+where a ``cookbook'' approach works well.  There are several tools
+that automate the process to some extent.  While people have embedded
+Python in other applications since its early existence, the process of
+embedding Python is less straightforward that writing an extension.
+Python 1.5 introduces a number of new API functions as well as some
+changes to the build process that make embedding much simpler.
+This manual describes the 1.5 state of affair (as of Python 1.5a3).
+% XXX Eventually, take the historical notes out
+
+Many API functions are useful independent of whether you're embedding
+or extending Python; moreover, most applications that embed Python
+will need to provide a custom extension as well, so it's probably a
+good idea to become familiar with writing an extension before
+attempting to embed Python in a real application.
+
+\section{Objects, Types and Reference Counts}
+
+Most Python/C API functions have one or more arguments as well as a
+return value of type \code{PyObject *}.  This type is a pointer
+(obviously!)  to an opaque data type representing an arbitrary Python
+object.  Since all Python object types are treated the same way by the
+Python language in most situations (e.g., assignments, scope rules,
+and argument passing), it is only fitting that they should be
+represented by a single C type.  All Python objects live on the heap:
+you never declare an automatic or static variable of type
+\code{PyObject}, only pointer variables of type \code{PyObject *} can
+be declared.
+
+All Python objects (even Python integers) have a ``type'' and a
+``reference count''.  An object's type determines what kind of object
+it is (e.g., an integer, a list, or a user-defined function; there are
+many more as explained in the Python Language Reference Manual).  For
+each of the well-known types there is a macro to check whether an
+object is of that type; for instance, \code{PyList_Check(a)} is true
+iff the object pointed to by \code{a} is a Python list.
+
+The reference count is important only because today's computers have a
+finite (and often severly limited) memory size; it counts how many
+different places there are that have a reference to an object.  Such a
+place could be another object, or a global (or static) C variable, or
+a local variable in some C function.  When an object's reference count
+becomes zero, the object is deallocated.  If it contains references to
+other objects, their reference count is decremented.  Those other
+objects may be deallocated in turn, if this decrement makes their
+reference count become zero, and so on.  (There's an obvious problem
+with objects that reference each other here; for now, the solution is
+``don't do that''.)
+
+Reference counts are always manipulated explicitly.  The normal way is
+to use the macro \code{Py_INCREF(a)} to increment an object's
+reference count by one, and \code{Py_DECREF(a)} to decrement it by
+one.  The latter macro is considerably more complex than the former,
+since it must check whether the reference count becomes zero and then
+cause the object's deallocator, which is a function pointer contained
+in the object's type structure.  The type-specific deallocator takes
+care of decrementing the reference counts for other objects contained
+in the object, and so on, if this is a compound object type such as a
+list.  There's no chance that the reference count can overflow; at
+least as many bits are used to hold the reference count as there are
+distinct memory locations in virtual memory (assuming
+\code{sizeof(long) >= sizeof(char *)}).  Thus, the reference count
+increment is a simple operation.
+
+It is not necessary to increment an object's reference count for every
+local variable that contains a pointer to an object.  In theory, the
+oject's reference count goes up by one when the variable is made to
+point to it and it goes down by one when the variable goes out of
+scope.  However, these two cancel each other out, so at the end the
+reference count hasn't changed.  The only real reason to use the
+reference count is to prevent the object from being deallocated as
+long as our variable is pointing to it.  If we know that there is at
+least one other reference to the object that lives at least as long as
+our variable, there is no need to increment the reference count
+temporarily.  An important situation where this arises is in objects
+that are passed as arguments to C functions in an extension module
+that are called from Python; the call mechanism guarantees to hold a
+reference to every argument for the duration of the call.
+
+However, a common pitfall is to extract an object from a list and
+holding on to it for a while without incrementing its reference count.
+Some other operation might conceivably remove the object from the
+list, decrementing its reference count and possible deallocating it.
+The real danger is that innocent-looking operations may invoke
+arbitrary Python code which could do this; there is a code path which
+allows control to flow back to the user from a \code{Py_DECREF()}, so
+almost any operation is potentially dangerous.
+
+A safe approach is to always use the generic operations (functions
+whose name begins with \code{PyObject_}, \code{PyNumber_},
+\code{PySequence_} or \code{PyMapping_}).  These operations always
+increment the reference count of the object they return.  This leaves
+the caller with the responsibility to call \code{Py_DECREF()} when
+they are done with the result; this soon becomes second nature.
+
+There are very few other data types that play a significant role in
+the Python/C API; most are all simple C types such as \code{int},
+\code{long}, \code{double} and \code{char *}.  A few structure types
+are used to describe static tables used to list the functions exported
+by a module or the data attributes of a new object type.  These will
+be discussed together with the functions that use them.
+
+\section{Exceptions}
+
+The Python programmer only needs to deal with exceptions if specific
+error handling is required; unhandled exceptions are automatically
+propagated to the caller, then to the caller's caller, and so on, till
+they reach the top-level interpreter, where they are reported to the
+user accompanied by a stack trace.
+
+For C programmers, however, error checking always has to be explicit.
+% XXX add more stuff here
+
+\section{Embedding Python}
+
+The one important task that only embedders of the Python interpreter
+have to worry about is the initialization (and possibly the
+finalization) of the Python interpreter.  Most functionality of the
+interpreter can only be used after the interpreter has been
+initialized.
+
+
+The basic initialization function is \code{Py_Initialize()}.  This
+initializes the table of loaded modules, and creates the fundamental
+modules \code{__builtin__}, \code{__main__} and \code{sys}.  It also
+initializes the module search path (\code{sys.path}).
+
+\code{Py_Initialize()} does not set the ``script argument list''
+(\code{sys.argv}).  If this variable is needed by Python code that
+will be executed later, it must be set explicitly with a call to
+\code{PySys_SetArgv(\var{argc}, \var{argv})} subsequent to the call
+to \code{Py_Initialize()}.
+
+On Unix, \code{Py_Initialize()} calculates the module search path
+based upon its best guess for the location of the standard Python
+interpreter executable, assuming that the Python library is found in a
+fixed location relative to the Python interpreter executable.  In
+particular, it looks for a directory named \code{lib/python1.5}
+(replacing \code{1.5} with the current interpreter version) relative
+to the parent directory where the executable named \code{python} is
+found on the shell command search path (the environment variable
+\code{$PATH}).  For instance, if the Python executable is found in
+\code{/usr/local/bin/python}, it will assume that the libraries are in
+\code{/usr/local/lib/python1.5}.  In fact, this also the ``fallback''
+location, used when no executable file named \code{python} is found
+along \code{\$PATH}.  The user can change this behavior by setting the
+environment variable \code{\$PYTHONHOME}, and can insert additional
+directories in front of the standard path by setting
+\code{\$PYTHONPATH}.
+
+The embedding application can steer the search by calling
+\code{Py_SetProgramName(\var{file})} \emph{before} calling
+\code{Py_Initialize()}.  Note that \code[$PYTHONHOME} still overrides
+this and \code{\$PYTHONPATH} is still inserted in front of the
+standard path.
+
+Sometimes, it is desirable to ``uninitialize'' Python.  For instance,
+the application may want to start over (make another call to
+\code{Py_Initialize()}) or the application is simply done with its
+use of Python and wants to free all memory allocated by Python.  This
+can be accomplished by calling \code{Py_Finalize()}.
+% XXX More...
+
+\section{Embedding Python in Threaded Applications}
+
+%XXX more here
+
+\chapter{Old Introduction}
+
 (XXX This is the old introduction, mostly by Jim Fulton -- should be
 rewritten.)

@ -56,7 +236,7 @@ enough to write a simple application that gets Python code from the
 user, execs it, and returns the output or errors.

 \item "Abstract objects layer": which is the subject of this chapter.
-It has many functions operating on objects, and lest you do many
+It has many functions operating on objects, and lets you do many
 things from C that you can also write in Python, without going through
 the Python parser.

@ -495,7 +675,7 @@ This function always succeeds.
 \end{cfuncdesc}

 \begin{cfuncdesc}{PyObject*}{PyObject_GetAttrString}{PyObject *o, char *attr_name}
-Retrieve an attributed named attr_name form object o.
+Retrieve an attributed named attr_name from object o.
 Returns the attribute value on success, or \NULL{} on failure.
 This is the equivalent of the Python expression: \code{o.attr_name}.
 \end{cfuncdesc}
@ -664,7 +844,7 @@ of the Python statement: \code{o[key]=v}.
 \begin{cfuncdesc}{int}{PyObject_DelItem}{PyObject *o, PyObject *key, PyObject *v}
 Delete the mapping for \code{key} from \code{*o}.  Returns -1
 on failure.
-This is the equivalent of the Python statement: del o[key].
+This is the equivalent of the Python statement: \code{del o[key]}.
 \end{cfuncdesc}


@ -745,7 +925,7 @@ the equivalent of the Python expression: \code{abs(o)}.
 \begin{cfuncdesc}{PyObject*}{PyNumber_Invert}{PyObject *o}
 Returns the bitwise negation of \code{o} on success, or \NULL{} on
 failure.  This is the equivalent of the Python expression:
-\code{~o}.
+\code{\~o}.
 \end{cfuncdesc}


@ -777,7 +957,7 @@ expression: \code{o1\^{ }o2}.
 \end{cfuncdesc}

 \begin{cfuncdesc}{PyObject*}{PyNumber_Or}{PyObject *o1, PyObject *o2}
-Returns the result or \code{o1} and \code{o2} on success, or \NULL{} on
+Returns the result of \code{o1} and \code{o2} on success, or \NULL{} on
 failure.  This is the equivalent of the Python expression: 
 \code{o1 or o2}.
 \end{cfuncdesc}
@ -837,7 +1017,7 @@ expression: \code{o1+o2}.


 \begin{cfuncdesc}{PyObject*}{PySequence_Repeat}{PyObject *o, int count}
-Return the result of repeating sequence object \code{o} count times,
+Return the result of repeating sequence object \code{o} \code{count} times,
 or \NULL{} on failure.  This is the equivalent of the Python
 expression: \code{o*count}.
 \end{cfuncdesc}
@ -899,7 +1079,7 @@ is equivalent to the Python expression: \code{value in o}.
 \end{cfuncdesc}

 \begin{cfuncdesc}{int}{PySequence_Index}{PyObject *o, PyObject *value}
-Return the first index for which \code{o[i]=value}.  On error,
+Return the first index for which \code{o[i]==value}.  On error,
 return -1.    This is equivalent to the Python
 expression: \code{o.index(value)}.
 \end{cfuncdesc}
--- a/Doc/api/api.tex
+++ b/Doc/api/api.tex
@ -40,6 +40,186 @@ API functions in detail.

 \chapter{Introduction}

+The Application Programmer's Interface to Python gives C and C++
+programmers access to the Python interpreter at a variety of levels.
+There are two fundamentally different reasons for using the Python/C
+API.  (The API is equally usable from C++, but for brevity it is
+generally referred to as the Python/C API.)  The first reason is to
+write ``extension modules'' for specific purposes; these are C modules
+that extend the Python interpreter.  This is probably the most common
+use.  The second reason is to use Python as a component in a larger
+application; this technique is generally referred to as ``embedding''
+Python in an application.
+
+Writing an extension module is a relatively well-understood process,
+where a ``cookbook'' approach works well.  There are several tools
+that automate the process to some extent.  While people have embedded
+Python in other applications since its early existence, the process of
+embedding Python is less straightforward that writing an extension.
+Python 1.5 introduces a number of new API functions as well as some
+changes to the build process that make embedding much simpler.
+This manual describes the 1.5 state of affair (as of Python 1.5a3).
+% XXX Eventually, take the historical notes out
+
+Many API functions are useful independent of whether you're embedding
+or extending Python; moreover, most applications that embed Python
+will need to provide a custom extension as well, so it's probably a
+good idea to become familiar with writing an extension before
+attempting to embed Python in a real application.
+
+\section{Objects, Types and Reference Counts}
+
+Most Python/C API functions have one or more arguments as well as a
+return value of type \code{PyObject *}.  This type is a pointer
+(obviously!)  to an opaque data type representing an arbitrary Python
+object.  Since all Python object types are treated the same way by the
+Python language in most situations (e.g., assignments, scope rules,
+and argument passing), it is only fitting that they should be
+represented by a single C type.  All Python objects live on the heap:
+you never declare an automatic or static variable of type
+\code{PyObject}, only pointer variables of type \code{PyObject *} can
+be declared.
+
+All Python objects (even Python integers) have a ``type'' and a
+``reference count''.  An object's type determines what kind of object
+it is (e.g., an integer, a list, or a user-defined function; there are
+many more as explained in the Python Language Reference Manual).  For
+each of the well-known types there is a macro to check whether an
+object is of that type; for instance, \code{PyList_Check(a)} is true
+iff the object pointed to by \code{a} is a Python list.
+
+The reference count is important only because today's computers have a
+finite (and often severly limited) memory size; it counts how many
+different places there are that have a reference to an object.  Such a
+place could be another object, or a global (or static) C variable, or
+a local variable in some C function.  When an object's reference count
+becomes zero, the object is deallocated.  If it contains references to
+other objects, their reference count is decremented.  Those other
+objects may be deallocated in turn, if this decrement makes their
+reference count become zero, and so on.  (There's an obvious problem
+with objects that reference each other here; for now, the solution is
+``don't do that''.)
+
+Reference counts are always manipulated explicitly.  The normal way is
+to use the macro \code{Py_INCREF(a)} to increment an object's
+reference count by one, and \code{Py_DECREF(a)} to decrement it by
+one.  The latter macro is considerably more complex than the former,
+since it must check whether the reference count becomes zero and then
+cause the object's deallocator, which is a function pointer contained
+in the object's type structure.  The type-specific deallocator takes
+care of decrementing the reference counts for other objects contained
+in the object, and so on, if this is a compound object type such as a
+list.  There's no chance that the reference count can overflow; at
+least as many bits are used to hold the reference count as there are
+distinct memory locations in virtual memory (assuming
+\code{sizeof(long) >= sizeof(char *)}).  Thus, the reference count
+increment is a simple operation.
+
+It is not necessary to increment an object's reference count for every
+local variable that contains a pointer to an object.  In theory, the
+oject's reference count goes up by one when the variable is made to
+point to it and it goes down by one when the variable goes out of
+scope.  However, these two cancel each other out, so at the end the
+reference count hasn't changed.  The only real reason to use the
+reference count is to prevent the object from being deallocated as
+long as our variable is pointing to it.  If we know that there is at
+least one other reference to the object that lives at least as long as
+our variable, there is no need to increment the reference count
+temporarily.  An important situation where this arises is in objects
+that are passed as arguments to C functions in an extension module
+that are called from Python; the call mechanism guarantees to hold a
+reference to every argument for the duration of the call.
+
+However, a common pitfall is to extract an object from a list and
+holding on to it for a while without incrementing its reference count.
+Some other operation might conceivably remove the object from the
+list, decrementing its reference count and possible deallocating it.
+The real danger is that innocent-looking operations may invoke
+arbitrary Python code which could do this; there is a code path which
+allows control to flow back to the user from a \code{Py_DECREF()}, so
+almost any operation is potentially dangerous.
+
+A safe approach is to always use the generic operations (functions
+whose name begins with \code{PyObject_}, \code{PyNumber_},
+\code{PySequence_} or \code{PyMapping_}).  These operations always
+increment the reference count of the object they return.  This leaves
+the caller with the responsibility to call \code{Py_DECREF()} when
+they are done with the result; this soon becomes second nature.
+
+There are very few other data types that play a significant role in
+the Python/C API; most are all simple C types such as \code{int},
+\code{long}, \code{double} and \code{char *}.  A few structure types
+are used to describe static tables used to list the functions exported
+by a module or the data attributes of a new object type.  These will
+be discussed together with the functions that use them.
+
+\section{Exceptions}
+
+The Python programmer only needs to deal with exceptions if specific
+error handling is required; unhandled exceptions are automatically
+propagated to the caller, then to the caller's caller, and so on, till
+they reach the top-level interpreter, where they are reported to the
+user accompanied by a stack trace.
+
+For C programmers, however, error checking always has to be explicit.
+% XXX add more stuff here
+
+\section{Embedding Python}
+
+The one important task that only embedders of the Python interpreter
+have to worry about is the initialization (and possibly the
+finalization) of the Python interpreter.  Most functionality of the
+interpreter can only be used after the interpreter has been
+initialized.
+
+
+The basic initialization function is \code{Py_Initialize()}.  This
+initializes the table of loaded modules, and creates the fundamental
+modules \code{__builtin__}, \code{__main__} and \code{sys}.  It also
+initializes the module search path (\code{sys.path}).
+
+\code{Py_Initialize()} does not set the ``script argument list''
+(\code{sys.argv}).  If this variable is needed by Python code that
+will be executed later, it must be set explicitly with a call to
+\code{PySys_SetArgv(\var{argc}, \var{argv})} subsequent to the call
+to \code{Py_Initialize()}.
+
+On Unix, \code{Py_Initialize()} calculates the module search path
+based upon its best guess for the location of the standard Python
+interpreter executable, assuming that the Python library is found in a
+fixed location relative to the Python interpreter executable.  In
+particular, it looks for a directory named \code{lib/python1.5}
+(replacing \code{1.5} with the current interpreter version) relative
+to the parent directory where the executable named \code{python} is
+found on the shell command search path (the environment variable
+\code{$PATH}).  For instance, if the Python executable is found in
+\code{/usr/local/bin/python}, it will assume that the libraries are in
+\code{/usr/local/lib/python1.5}.  In fact, this also the ``fallback''
+location, used when no executable file named \code{python} is found
+along \code{\$PATH}.  The user can change this behavior by setting the
+environment variable \code{\$PYTHONHOME}, and can insert additional
+directories in front of the standard path by setting
+\code{\$PYTHONPATH}.
+
+The embedding application can steer the search by calling
+\code{Py_SetProgramName(\var{file})} \emph{before} calling
+\code{Py_Initialize()}.  Note that \code[$PYTHONHOME} still overrides
+this and \code{\$PYTHONPATH} is still inserted in front of the
+standard path.
+
+Sometimes, it is desirable to ``uninitialize'' Python.  For instance,
+the application may want to start over (make another call to
+\code{Py_Initialize()}) or the application is simply done with its
+use of Python and wants to free all memory allocated by Python.  This
+can be accomplished by calling \code{Py_Finalize()}.
+% XXX More...
+
+\section{Embedding Python in Threaded Applications}
+
+%XXX more here
+
+\chapter{Old Introduction}
+
 (XXX This is the old introduction, mostly by Jim Fulton -- should be
 rewritten.)

@ -56,7 +236,7 @@ enough to write a simple application that gets Python code from the
 user, execs it, and returns the output or errors.

 \item "Abstract objects layer": which is the subject of this chapter.
-It has many functions operating on objects, and lest you do many
+It has many functions operating on objects, and lets you do many
 things from C that you can also write in Python, without going through
 the Python parser.

@ -495,7 +675,7 @@ This function always succeeds.
 \end{cfuncdesc}

 \begin{cfuncdesc}{PyObject*}{PyObject_GetAttrString}{PyObject *o, char *attr_name}
-Retrieve an attributed named attr_name form object o.
+Retrieve an attributed named attr_name from object o.
 Returns the attribute value on success, or \NULL{} on failure.
 This is the equivalent of the Python expression: \code{o.attr_name}.
 \end{cfuncdesc}
@ -664,7 +844,7 @@ of the Python statement: \code{o[key]=v}.
 \begin{cfuncdesc}{int}{PyObject_DelItem}{PyObject *o, PyObject *key, PyObject *v}
 Delete the mapping for \code{key} from \code{*o}.  Returns -1
 on failure.
-This is the equivalent of the Python statement: del o[key].
+This is the equivalent of the Python statement: \code{del o[key]}.
 \end{cfuncdesc}


@ -745,7 +925,7 @@ the equivalent of the Python expression: \code{abs(o)}.
 \begin{cfuncdesc}{PyObject*}{PyNumber_Invert}{PyObject *o}
 Returns the bitwise negation of \code{o} on success, or \NULL{} on
 failure.  This is the equivalent of the Python expression:
-\code{~o}.
+\code{\~o}.
 \end{cfuncdesc}


@ -777,7 +957,7 @@ expression: \code{o1\^{ }o2}.
 \end{cfuncdesc}

 \begin{cfuncdesc}{PyObject*}{PyNumber_Or}{PyObject *o1, PyObject *o2}
-Returns the result or \code{o1} and \code{o2} on success, or \NULL{} on
+Returns the result of \code{o1} and \code{o2} on success, or \NULL{} on
 failure.  This is the equivalent of the Python expression: 
 \code{o1 or o2}.
 \end{cfuncdesc}
@ -837,7 +1017,7 @@ expression: \code{o1+o2}.


 \begin{cfuncdesc}{PyObject*}{PySequence_Repeat}{PyObject *o, int count}
-Return the result of repeating sequence object \code{o} count times,
+Return the result of repeating sequence object \code{o} \code{count} times,
 or \NULL{} on failure.  This is the equivalent of the Python
 expression: \code{o*count}.
 \end{cfuncdesc}
@ -899,7 +1079,7 @@ is equivalent to the Python expression: \code{value in o}.
 \end{cfuncdesc}

 \begin{cfuncdesc}{int}{PySequence_Index}{PyObject *o, PyObject *value}
-Return the first index for which \code{o[i]=value}.  On error,
+Return the first index for which \code{o[i]==value}.  On error,
 return -1.    This is equivalent to the Python
 expression: \code{o.index(value)}.
 \end{cfuncdesc}