From 59a61352ad1e1a47b9b07f2264f1504ac348d0c9 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Thu, 14 Aug 1997 20:34:33 +0000 Subject: [PATCH] Added new intro sections (incomplete); fixed various typos --- Doc/api.tex | 194 ++++++++++++++++++++++++++++++++++++++++++++++-- Doc/api/api.tex | 194 ++++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 374 insertions(+), 14 deletions(-) diff --git a/Doc/api.tex b/Doc/api.tex index bc067af61d5..95b7b583360 100644 --- a/Doc/api.tex +++ b/Doc/api.tex @@ -40,6 +40,186 @@ API functions in detail. \chapter{Introduction} +The Application Programmer's Interface to Python gives C and C++ +programmers access to the Python interpreter at a variety of levels. +There are two fundamentally different reasons for using the Python/C +API. (The API is equally usable from C++, but for brevity it is +generally referred to as the Python/C API.) The first reason is to +write ``extension modules'' for specific purposes; these are C modules +that extend the Python interpreter. This is probably the most common +use. The second reason is to use Python as a component in a larger +application; this technique is generally referred to as ``embedding'' +Python in an application. + +Writing an extension module is a relatively well-understood process, +where a ``cookbook'' approach works well. There are several tools +that automate the process to some extent. While people have embedded +Python in other applications since its early existence, the process of +embedding Python is less straightforward that writing an extension. +Python 1.5 introduces a number of new API functions as well as some +changes to the build process that make embedding much simpler. +This manual describes the 1.5 state of affair (as of Python 1.5a3). +% XXX Eventually, take the historical notes out + +Many API functions are useful independent of whether you're embedding +or extending Python; moreover, most applications that embed Python +will need to provide a custom extension as well, so it's probably a +good idea to become familiar with writing an extension before +attempting to embed Python in a real application. + +\section{Objects, Types and Reference Counts} + +Most Python/C API functions have one or more arguments as well as a +return value of type \code{PyObject *}. This type is a pointer +(obviously!) to an opaque data type representing an arbitrary Python +object. Since all Python object types are treated the same way by the +Python language in most situations (e.g., assignments, scope rules, +and argument passing), it is only fitting that they should be +represented by a single C type. All Python objects live on the heap: +you never declare an automatic or static variable of type +\code{PyObject}, only pointer variables of type \code{PyObject *} can +be declared. + +All Python objects (even Python integers) have a ``type'' and a +``reference count''. An object's type determines what kind of object +it is (e.g., an integer, a list, or a user-defined function; there are +many more as explained in the Python Language Reference Manual). For +each of the well-known types there is a macro to check whether an +object is of that type; for instance, \code{PyList_Check(a)} is true +iff the object pointed to by \code{a} is a Python list. + +The reference count is important only because today's computers have a +finite (and often severly limited) memory size; it counts how many +different places there are that have a reference to an object. Such a +place could be another object, or a global (or static) C variable, or +a local variable in some C function. When an object's reference count +becomes zero, the object is deallocated. If it contains references to +other objects, their reference count is decremented. Those other +objects may be deallocated in turn, if this decrement makes their +reference count become zero, and so on. (There's an obvious problem +with objects that reference each other here; for now, the solution is +``don't do that''.) + +Reference counts are always manipulated explicitly. The normal way is +to use the macro \code{Py_INCREF(a)} to increment an object's +reference count by one, and \code{Py_DECREF(a)} to decrement it by +one. The latter macro is considerably more complex than the former, +since it must check whether the reference count becomes zero and then +cause the object's deallocator, which is a function pointer contained +in the object's type structure. The type-specific deallocator takes +care of decrementing the reference counts for other objects contained +in the object, and so on, if this is a compound object type such as a +list. There's no chance that the reference count can overflow; at +least as many bits are used to hold the reference count as there are +distinct memory locations in virtual memory (assuming +\code{sizeof(long) >= sizeof(char *)}). Thus, the reference count +increment is a simple operation. + +It is not necessary to increment an object's reference count for every +local variable that contains a pointer to an object. In theory, the +oject's reference count goes up by one when the variable is made to +point to it and it goes down by one when the variable goes out of +scope. However, these two cancel each other out, so at the end the +reference count hasn't changed. The only real reason to use the +reference count is to prevent the object from being deallocated as +long as our variable is pointing to it. If we know that there is at +least one other reference to the object that lives at least as long as +our variable, there is no need to increment the reference count +temporarily. An important situation where this arises is in objects +that are passed as arguments to C functions in an extension module +that are called from Python; the call mechanism guarantees to hold a +reference to every argument for the duration of the call. + +However, a common pitfall is to extract an object from a list and +holding on to it for a while without incrementing its reference count. +Some other operation might conceivably remove the object from the +list, decrementing its reference count and possible deallocating it. +The real danger is that innocent-looking operations may invoke +arbitrary Python code which could do this; there is a code path which +allows control to flow back to the user from a \code{Py_DECREF()}, so +almost any operation is potentially dangerous. + +A safe approach is to always use the generic operations (functions +whose name begins with \code{PyObject_}, \code{PyNumber_}, +\code{PySequence_} or \code{PyMapping_}). These operations always +increment the reference count of the object they return. This leaves +the caller with the responsibility to call \code{Py_DECREF()} when +they are done with the result; this soon becomes second nature. + +There are very few other data types that play a significant role in +the Python/C API; most are all simple C types such as \code{int}, +\code{long}, \code{double} and \code{char *}. A few structure types +are used to describe static tables used to list the functions exported +by a module or the data attributes of a new object type. These will +be discussed together with the functions that use them. + +\section{Exceptions} + +The Python programmer only needs to deal with exceptions if specific +error handling is required; unhandled exceptions are automatically +propagated to the caller, then to the caller's caller, and so on, till +they reach the top-level interpreter, where they are reported to the +user accompanied by a stack trace. + +For C programmers, however, error checking always has to be explicit. +% XXX add more stuff here + +\section{Embedding Python} + +The one important task that only embedders of the Python interpreter +have to worry about is the initialization (and possibly the +finalization) of the Python interpreter. Most functionality of the +interpreter can only be used after the interpreter has been +initialized. + + +The basic initialization function is \code{Py_Initialize()}. This +initializes the table of loaded modules, and creates the fundamental +modules \code{__builtin__}, \code{__main__} and \code{sys}. It also +initializes the module search path (\code{sys.path}). + +\code{Py_Initialize()} does not set the ``script argument list'' +(\code{sys.argv}). If this variable is needed by Python code that +will be executed later, it must be set explicitly with a call to +\code{PySys_SetArgv(\var{argc}, \var{argv})} subsequent to the call +to \code{Py_Initialize()}. + +On Unix, \code{Py_Initialize()} calculates the module search path +based upon its best guess for the location of the standard Python +interpreter executable, assuming that the Python library is found in a +fixed location relative to the Python interpreter executable. In +particular, it looks for a directory named \code{lib/python1.5} +(replacing \code{1.5} with the current interpreter version) relative +to the parent directory where the executable named \code{python} is +found on the shell command search path (the environment variable +\code{$PATH}). For instance, if the Python executable is found in +\code{/usr/local/bin/python}, it will assume that the libraries are in +\code{/usr/local/lib/python1.5}. In fact, this also the ``fallback'' +location, used when no executable file named \code{python} is found +along \code{\$PATH}. The user can change this behavior by setting the +environment variable \code{\$PYTHONHOME}, and can insert additional +directories in front of the standard path by setting +\code{\$PYTHONPATH}. + +The embedding application can steer the search by calling +\code{Py_SetProgramName(\var{file})} \emph{before} calling +\code{Py_Initialize()}. Note that \code[$PYTHONHOME} still overrides +this and \code{\$PYTHONPATH} is still inserted in front of the +standard path. + +Sometimes, it is desirable to ``uninitialize'' Python. For instance, +the application may want to start over (make another call to +\code{Py_Initialize()}) or the application is simply done with its +use of Python and wants to free all memory allocated by Python. This +can be accomplished by calling \code{Py_Finalize()}. +% XXX More... + +\section{Embedding Python in Threaded Applications} + +%XXX more here + +\chapter{Old Introduction} + (XXX This is the old introduction, mostly by Jim Fulton -- should be rewritten.) @@ -56,7 +236,7 @@ enough to write a simple application that gets Python code from the user, execs it, and returns the output or errors. \item "Abstract objects layer": which is the subject of this chapter. -It has many functions operating on objects, and lest you do many +It has many functions operating on objects, and lets you do many things from C that you can also write in Python, without going through the Python parser. @@ -495,7 +675,7 @@ This function always succeeds. \end{cfuncdesc} \begin{cfuncdesc}{PyObject*}{PyObject_GetAttrString}{PyObject *o, char *attr_name} -Retrieve an attributed named attr_name form object o. +Retrieve an attributed named attr_name from object o. Returns the attribute value on success, or \NULL{} on failure. This is the equivalent of the Python expression: \code{o.attr_name}. \end{cfuncdesc} @@ -664,7 +844,7 @@ of the Python statement: \code{o[key]=v}. \begin{cfuncdesc}{int}{PyObject_DelItem}{PyObject *o, PyObject *key, PyObject *v} Delete the mapping for \code{key} from \code{*o}. Returns -1 on failure. -This is the equivalent of the Python statement: del o[key]. +This is the equivalent of the Python statement: \code{del o[key]}. \end{cfuncdesc} @@ -745,7 +925,7 @@ the equivalent of the Python expression: \code{abs(o)}. \begin{cfuncdesc}{PyObject*}{PyNumber_Invert}{PyObject *o} Returns the bitwise negation of \code{o} on success, or \NULL{} on failure. This is the equivalent of the Python expression: -\code{~o}. +\code{\~o}. \end{cfuncdesc} @@ -777,7 +957,7 @@ expression: \code{o1\^{ }o2}. \end{cfuncdesc} \begin{cfuncdesc}{PyObject*}{PyNumber_Or}{PyObject *o1, PyObject *o2} -Returns the result or \code{o1} and \code{o2} on success, or \NULL{} on +Returns the result of \code{o1} and \code{o2} on success, or \NULL{} on failure. This is the equivalent of the Python expression: \code{o1 or o2}. \end{cfuncdesc} @@ -837,7 +1017,7 @@ expression: \code{o1+o2}. \begin{cfuncdesc}{PyObject*}{PySequence_Repeat}{PyObject *o, int count} -Return the result of repeating sequence object \code{o} count times, +Return the result of repeating sequence object \code{o} \code{count} times, or \NULL{} on failure. This is the equivalent of the Python expression: \code{o*count}. \end{cfuncdesc} @@ -899,7 +1079,7 @@ is equivalent to the Python expression: \code{value in o}. \end{cfuncdesc} \begin{cfuncdesc}{int}{PySequence_Index}{PyObject *o, PyObject *value} -Return the first index for which \code{o[i]=value}. On error, +Return the first index for which \code{o[i]==value}. On error, return -1. This is equivalent to the Python expression: \code{o.index(value)}. \end{cfuncdesc} diff --git a/Doc/api/api.tex b/Doc/api/api.tex index bc067af61d5..95b7b583360 100644 --- a/Doc/api/api.tex +++ b/Doc/api/api.tex @@ -40,6 +40,186 @@ API functions in detail. \chapter{Introduction} +The Application Programmer's Interface to Python gives C and C++ +programmers access to the Python interpreter at a variety of levels. +There are two fundamentally different reasons for using the Python/C +API. (The API is equally usable from C++, but for brevity it is +generally referred to as the Python/C API.) The first reason is to +write ``extension modules'' for specific purposes; these are C modules +that extend the Python interpreter. This is probably the most common +use. The second reason is to use Python as a component in a larger +application; this technique is generally referred to as ``embedding'' +Python in an application. + +Writing an extension module is a relatively well-understood process, +where a ``cookbook'' approach works well. There are several tools +that automate the process to some extent. While people have embedded +Python in other applications since its early existence, the process of +embedding Python is less straightforward that writing an extension. +Python 1.5 introduces a number of new API functions as well as some +changes to the build process that make embedding much simpler. +This manual describes the 1.5 state of affair (as of Python 1.5a3). +% XXX Eventually, take the historical notes out + +Many API functions are useful independent of whether you're embedding +or extending Python; moreover, most applications that embed Python +will need to provide a custom extension as well, so it's probably a +good idea to become familiar with writing an extension before +attempting to embed Python in a real application. + +\section{Objects, Types and Reference Counts} + +Most Python/C API functions have one or more arguments as well as a +return value of type \code{PyObject *}. This type is a pointer +(obviously!) to an opaque data type representing an arbitrary Python +object. Since all Python object types are treated the same way by the +Python language in most situations (e.g., assignments, scope rules, +and argument passing), it is only fitting that they should be +represented by a single C type. All Python objects live on the heap: +you never declare an automatic or static variable of type +\code{PyObject}, only pointer variables of type \code{PyObject *} can +be declared. + +All Python objects (even Python integers) have a ``type'' and a +``reference count''. An object's type determines what kind of object +it is (e.g., an integer, a list, or a user-defined function; there are +many more as explained in the Python Language Reference Manual). For +each of the well-known types there is a macro to check whether an +object is of that type; for instance, \code{PyList_Check(a)} is true +iff the object pointed to by \code{a} is a Python list. + +The reference count is important only because today's computers have a +finite (and often severly limited) memory size; it counts how many +different places there are that have a reference to an object. Such a +place could be another object, or a global (or static) C variable, or +a local variable in some C function. When an object's reference count +becomes zero, the object is deallocated. If it contains references to +other objects, their reference count is decremented. Those other +objects may be deallocated in turn, if this decrement makes their +reference count become zero, and so on. (There's an obvious problem +with objects that reference each other here; for now, the solution is +``don't do that''.) + +Reference counts are always manipulated explicitly. The normal way is +to use the macro \code{Py_INCREF(a)} to increment an object's +reference count by one, and \code{Py_DECREF(a)} to decrement it by +one. The latter macro is considerably more complex than the former, +since it must check whether the reference count becomes zero and then +cause the object's deallocator, which is a function pointer contained +in the object's type structure. The type-specific deallocator takes +care of decrementing the reference counts for other objects contained +in the object, and so on, if this is a compound object type such as a +list. There's no chance that the reference count can overflow; at +least as many bits are used to hold the reference count as there are +distinct memory locations in virtual memory (assuming +\code{sizeof(long) >= sizeof(char *)}). Thus, the reference count +increment is a simple operation. + +It is not necessary to increment an object's reference count for every +local variable that contains a pointer to an object. In theory, the +oject's reference count goes up by one when the variable is made to +point to it and it goes down by one when the variable goes out of +scope. However, these two cancel each other out, so at the end the +reference count hasn't changed. The only real reason to use the +reference count is to prevent the object from being deallocated as +long as our variable is pointing to it. If we know that there is at +least one other reference to the object that lives at least as long as +our variable, there is no need to increment the reference count +temporarily. An important situation where this arises is in objects +that are passed as arguments to C functions in an extension module +that are called from Python; the call mechanism guarantees to hold a +reference to every argument for the duration of the call. + +However, a common pitfall is to extract an object from a list and +holding on to it for a while without incrementing its reference count. +Some other operation might conceivably remove the object from the +list, decrementing its reference count and possible deallocating it. +The real danger is that innocent-looking operations may invoke +arbitrary Python code which could do this; there is a code path which +allows control to flow back to the user from a \code{Py_DECREF()}, so +almost any operation is potentially dangerous. + +A safe approach is to always use the generic operations (functions +whose name begins with \code{PyObject_}, \code{PyNumber_}, +\code{PySequence_} or \code{PyMapping_}). These operations always +increment the reference count of the object they return. This leaves +the caller with the responsibility to call \code{Py_DECREF()} when +they are done with the result; this soon becomes second nature. + +There are very few other data types that play a significant role in +the Python/C API; most are all simple C types such as \code{int}, +\code{long}, \code{double} and \code{char *}. A few structure types +are used to describe static tables used to list the functions exported +by a module or the data attributes of a new object type. These will +be discussed together with the functions that use them. + +\section{Exceptions} + +The Python programmer only needs to deal with exceptions if specific +error handling is required; unhandled exceptions are automatically +propagated to the caller, then to the caller's caller, and so on, till +they reach the top-level interpreter, where they are reported to the +user accompanied by a stack trace. + +For C programmers, however, error checking always has to be explicit. +% XXX add more stuff here + +\section{Embedding Python} + +The one important task that only embedders of the Python interpreter +have to worry about is the initialization (and possibly the +finalization) of the Python interpreter. Most functionality of the +interpreter can only be used after the interpreter has been +initialized. + + +The basic initialization function is \code{Py_Initialize()}. This +initializes the table of loaded modules, and creates the fundamental +modules \code{__builtin__}, \code{__main__} and \code{sys}. It also +initializes the module search path (\code{sys.path}). + +\code{Py_Initialize()} does not set the ``script argument list'' +(\code{sys.argv}). If this variable is needed by Python code that +will be executed later, it must be set explicitly with a call to +\code{PySys_SetArgv(\var{argc}, \var{argv})} subsequent to the call +to \code{Py_Initialize()}. + +On Unix, \code{Py_Initialize()} calculates the module search path +based upon its best guess for the location of the standard Python +interpreter executable, assuming that the Python library is found in a +fixed location relative to the Python interpreter executable. In +particular, it looks for a directory named \code{lib/python1.5} +(replacing \code{1.5} with the current interpreter version) relative +to the parent directory where the executable named \code{python} is +found on the shell command search path (the environment variable +\code{$PATH}). For instance, if the Python executable is found in +\code{/usr/local/bin/python}, it will assume that the libraries are in +\code{/usr/local/lib/python1.5}. In fact, this also the ``fallback'' +location, used when no executable file named \code{python} is found +along \code{\$PATH}. The user can change this behavior by setting the +environment variable \code{\$PYTHONHOME}, and can insert additional +directories in front of the standard path by setting +\code{\$PYTHONPATH}. + +The embedding application can steer the search by calling +\code{Py_SetProgramName(\var{file})} \emph{before} calling +\code{Py_Initialize()}. Note that \code[$PYTHONHOME} still overrides +this and \code{\$PYTHONPATH} is still inserted in front of the +standard path. + +Sometimes, it is desirable to ``uninitialize'' Python. For instance, +the application may want to start over (make another call to +\code{Py_Initialize()}) or the application is simply done with its +use of Python and wants to free all memory allocated by Python. This +can be accomplished by calling \code{Py_Finalize()}. +% XXX More... + +\section{Embedding Python in Threaded Applications} + +%XXX more here + +\chapter{Old Introduction} + (XXX This is the old introduction, mostly by Jim Fulton -- should be rewritten.) @@ -56,7 +236,7 @@ enough to write a simple application that gets Python code from the user, execs it, and returns the output or errors. \item "Abstract objects layer": which is the subject of this chapter. -It has many functions operating on objects, and lest you do many +It has many functions operating on objects, and lets you do many things from C that you can also write in Python, without going through the Python parser. @@ -495,7 +675,7 @@ This function always succeeds. \end{cfuncdesc} \begin{cfuncdesc}{PyObject*}{PyObject_GetAttrString}{PyObject *o, char *attr_name} -Retrieve an attributed named attr_name form object o. +Retrieve an attributed named attr_name from object o. Returns the attribute value on success, or \NULL{} on failure. This is the equivalent of the Python expression: \code{o.attr_name}. \end{cfuncdesc} @@ -664,7 +844,7 @@ of the Python statement: \code{o[key]=v}. \begin{cfuncdesc}{int}{PyObject_DelItem}{PyObject *o, PyObject *key, PyObject *v} Delete the mapping for \code{key} from \code{*o}. Returns -1 on failure. -This is the equivalent of the Python statement: del o[key]. +This is the equivalent of the Python statement: \code{del o[key]}. \end{cfuncdesc} @@ -745,7 +925,7 @@ the equivalent of the Python expression: \code{abs(o)}. \begin{cfuncdesc}{PyObject*}{PyNumber_Invert}{PyObject *o} Returns the bitwise negation of \code{o} on success, or \NULL{} on failure. This is the equivalent of the Python expression: -\code{~o}. +\code{\~o}. \end{cfuncdesc} @@ -777,7 +957,7 @@ expression: \code{o1\^{ }o2}. \end{cfuncdesc} \begin{cfuncdesc}{PyObject*}{PyNumber_Or}{PyObject *o1, PyObject *o2} -Returns the result or \code{o1} and \code{o2} on success, or \NULL{} on +Returns the result of \code{o1} and \code{o2} on success, or \NULL{} on failure. This is the equivalent of the Python expression: \code{o1 or o2}. \end{cfuncdesc} @@ -837,7 +1017,7 @@ expression: \code{o1+o2}. \begin{cfuncdesc}{PyObject*}{PySequence_Repeat}{PyObject *o, int count} -Return the result of repeating sequence object \code{o} count times, +Return the result of repeating sequence object \code{o} \code{count} times, or \NULL{} on failure. This is the equivalent of the Python expression: \code{o*count}. \end{cfuncdesc} @@ -899,7 +1079,7 @@ is equivalent to the Python expression: \code{value in o}. \end{cfuncdesc} \begin{cfuncdesc}{int}{PySequence_Index}{PyObject *o, PyObject *value} -Return the first index for which \code{o[i]=value}. On error, +Return the first index for which \code{o[i]==value}. On error, return -1. This is equivalent to the Python expression: \code{o.index(value)}. \end{cfuncdesc}