yet a better introduction

This commit is contained in:
Guido van Rossum 1995-03-20 14:24:09 +00:00
parent f1245a8291
commit b92112da0e
2 changed files with 294 additions and 286 deletions

View File

@ -20,11 +20,22 @@
\begin{abstract} \begin{abstract}
\noindent \noindent
This document describes how to write modules in C or \Cpp{} to extend the Python is an interpreted, object-oriented programming language. This
Python interpreter. It also describes how to use Python as an document describes how to write modules in C or \Cpp{} to extend the
`embedded' language, and how extension modules can be loaded Python interpreter with new modules. Those modules can define new
dynamically (at run time) into the interpreter, if the operating functions but also new object types and their methods. The document
system supports this feature. also describes how to embed the Python interpreter in another
application, for use as an extension language. Finally, it shows how
to compile and link extension modules so that they can be loaded
dynamically (at run time) into the interpreter, if the underlying
operating system supports this feature.
This document assumes basic knowledge about Python. For an informal
introduction to the language, see the Python Tutorial. The Python
Reference Manual gives a more formal definition of the language. The
Python Library Reference documents the existing object types,
functions and modules (both built-in and written in Python) that give
the language its wide application range.
\end{abstract} \end{abstract}
@ -45,46 +56,43 @@ system supports this feature.
\section{Introduction} \section{Introduction}
It is quite easy to add non-standard built-in modules to Python, if It is quite easy to add new built-in modules to Python, if you know
you know how to program in C. A built-in module known to the Python how to program in C. Such \dfn{extension modules} can do two things
programmer as \code{spam} is generally implemented by a file called that can't be done directly in Python: they can implement new built-in
\file{spammodule.c} (if the module name is very long, like object types, and they can call C library functions and system calls.
\samp{spammify}, you can drop the \samp{module}, leaving a file name
like \file{spammify.c}). The standard built-in modules also adhere to
this convention, and in fact some of them are excellent examples of
how to create an extension.
Extension modules can do two things that can't be done directly in
Python: they can implement new data types (which are different from
classes, by the way), and they can make system calls or call C library
functions.
To support extensions, the Python API (Application Programmers To support extensions, the Python API (Application Programmers
Interface) defines many functions, macros and variables that provide Interface) defines a set of functions, macros and variables that
access to almost every aspect of the Python run-time system. provide access to most aspects of the Python run-time system. The
Most of the Python API is imported by including the single header file Python API is incorporated in a C source file by including the header
\code{"Python.h"}. All user-visible symbols defined by including this \code{"Python.h"}.
file have a prefix of \samp{Py} or \samp{PY}, except those defined in
standard header files --- for convenience, and since they are needed by
the Python interpreter, \file{"Python.h"} includes a few standard
header files: \file{<stdio.h>}, \file{<string.h>}, \file{<errno.h>},
and \file{<stdlib.h>}. If the latter header file does not exist on
your system, it declares the functions \code{malloc()}, \code{free()}
and \code{realloc()} itself.
The compilation of an extension module depends on your system setup The compilation of an extension module depends on its intended use as
and the intended use of the module; details are given in a later well as on your system setup; details are given in a later section.
section.
Note: unless otherwise mentioned, all file references in this
document are relative to the Python toplevel directory
(the directory that contains the \file{configure} script).
\section{A Simple Example} \section{A Simple Example}
Let's create an extension module called \samp{spam}. Create a file Let's create an extension module called \samp{spam} (the favorite food
\samp{spammodule.c}. The first line of this file can be: of Monty Python fans...) and let's say we want to create a Python
interface to the C library function \code{system()}.\footnote{An
interface for this function already exists in the standard module
\code{os} --- it was chosen as a simple and straightfoward example.}
This function takes a null-terminated character string as argument and
returns an integer. We want this function to be callable from Python
as follows:
\begin{verbatim}
>>> import spam
>>> status = spam.system("ls -l")
\end{verbatim}
Begin by creating a file \samp{spammodule.c}. (In general, if a
module is called \samp{spam}, the C file containing its implementation
is called \file{spammodule.c}; if the module name is very long, like
\samp{spammify}, the module name can be just \file{spammify.c}.)
The first line of our file can be:
\begin{verbatim} \begin{verbatim}
#include "Python.h" #include "Python.h"
@ -93,21 +101,18 @@ Let's create an extension module called \samp{spam}. Create a file
which pulls in the Python API (you can add a comment describing the which pulls in the Python API (you can add a comment describing the
purpose of the module and a copyright notice if you like). purpose of the module and a copyright notice if you like).
Let's create a Python interface to the C library function All user-visible symbols defined by \code{"Python.h"} have a prefix of
\code{system()}.\footnote{An interface for this function already \samp{Py} or \samp{PY}, except those defined in standard header files.
exists in the \code{posix} module --- it was chosen as a simple and For convenience, and since they are used extensively by the Python
straightfoward example.} This function takes a zero-terminated interpreter, \code{"Python.h"} includes a few standard header files:
character string as argument and returns an integer. We will want \code{<stdio.h>}, \code{<string.h>}, \code{<errno.h>}, and
this function to be callable from Python as follows: \code{<stdlib.h>}. If the latter header file does not exist on your
system, it declares the functions \code{malloc()}, \code{free()} and
\begin{verbatim} \code{realloc()} directly.
>>> import spam
>>> status = spam.system("ls -l")
\end{verbatim}
The next thing we add to our module file is the C function that will The next thing we add to our module file is the C function that will
be called when the Python expression \samp{spam.system(\var{string})} be called when the Python expression \samp{spam.system(\var{string})}
is evaluated (well see shortly how it ends up being called): is evaluated (we'll see shortly how it ends up being called):
\begin{verbatim} \begin{verbatim}
static PyObject * static PyObject *
@ -125,35 +130,32 @@ is evaluated (well see shortly how it ends up being called):
\end{verbatim} \end{verbatim}
There is a straightforward translation from the argument list in There is a straightforward translation from the argument list in
Python (here the single expression \code{"ls -l"}) to the arguments Python (e.g.\ the single expression \code{"ls -l"}) to the arguments
that are passed to the C function. The C function always has two passed to the C function. The C function always has two arguments,
arguments, conventionally named \var{self} and \var{args}. conventionally named \var{self} and \var{args}.
The \var{self} argument is only used when the C function implements a The \var{self} argument is only used when the C function implements a
builtin method --- this will be discussed later. In the example, builtin method. This will be discussed later. In the example,
\var{self} will always be a \code{NULL} pointer, since we are defining \var{self} will always be a \code{NULL} pointer, since we are defining
a function, not a method. (This is done so that the interpreter a function, not a method. (This is done so that the interpreter
doesn't have to understand two different types of C functions.) doesn't have to understand two different types of C functions.)
The \var{args} argument will be a pointer to a Python tuple object The \var{args} argument will be a pointer to a Python tuple object
containing the arguments --- the length of the tuple will be the containing the arguments. Each item of the tuple corresponds to an
number of arguments. It is necessary to do full argument type argument in the call's argument list. The arguments are Python
checking in each call, since otherwise the Python user would be able objects -- in order to do anything with them in our C function we have
to cause the Python interpreter to crash (rather than raising an to convert them to C values. The function \code{PyArg_ParseTuple()}
exception) by passing invalid arguments to a function in an extension in the Python API checks the argument types and converts them to C
module. Because argument checking and converting arguments to C are values. It uses a template string to determine the required types of
such common tasks, there's a general function in the Python the arguments as well as the types of the C variables into which to
interpreter that combines them: \code{PyArg_ParseTuple()}. It uses a store the converted values. More about this later.
template string to determine the types of the Python argument and the
types of the C variables into which it should store the converted
values (more about this later).
\code{PyArg_ParseTuple()} returns nonzero if all arguments have the \code{PyArg_ParseTuple()} returns true (nonzero) if all arguments have
right type and its components have been stored in the variables whose the right type and its components have been stored in the variables
addresses are passed. It returns zero if an invalid argument was whose addresses are passed. It returns false (zero) if an invalid
passed. In the latter case it also raises an appropriate exception by argument list was passed. In the latter case it also raises an
so the calling function can return \code{NULL} immediately. Here's appropriate exception by so the calling function can return
why: \code{NULL} immediately (as we saw in the example).
\section{Intermezzo: Errors and Exceptions} \section{Intermezzo: Errors and Exceptions}
@ -161,53 +163,56 @@ why:
An important convention throughout the Python interpreter is the An important convention throughout the Python interpreter is the
following: when a function fails, it should set an exception condition following: when a function fails, it should set an exception condition
and return an error value (usually a \code{NULL} pointer). Exceptions and return an error value (usually a \code{NULL} pointer). Exceptions
are stored in a static global variable inside the interpreter; if are stored in a static global variable inside the interpreter; if this
this variable is \code{NULL} no exception has occurred. A second variable is \code{NULL} no exception has occurred. A second global
global variable stores the `associated value' of the exception variable stores the ``associated value'' of the exception (the second
--- the second argument to \code{raise}. A third variable contains argument to \code{raise}). A third variable contains the stack
the stack traceback in case the error originated in Python code. traceback in case the error originated in Python code. These three
These three variables are the C equivalents of the Python variables variables are the C equivalents of the Python variables
\code{sys.exc_type}, \code{sys.exc_value} and \code{sys.exc_traceback} \code{sys.exc_type}, \code{sys.exc_value} and \code{sys.exc_traceback}
--- see the section on module \code{sys} in the Library Reference (see the section on module \code{sys} in the Library Reference
Manual. It is important to know about them to understand how errors Manual). It is important to know about them to understand how errors
are passed around. are passed around.
The Python API defines a host of functions to set various types of The Python API defines a number of functions to set various types of
exceptions. The most common one is \code{PyErr_SetString()} --- its exceptions.
arguments are an exception object (e.g. \code{PyExc_RuntimeError} ---
actually it can be any object that is a legal exception indicator), The most common one is \code{PyErr_SetString()}. Its arguments are an
and a C string indicating the cause of the error (this is converted to exception object and a C string. The exception object is usually a
a string object and stored as the `associated value' of the predefined object like \code{PyExc_ZeroDivisionError}. The C string
exception). Another useful function is \code{PyErr_SetFromErrno()}, indicates the cause of the error and is converted to a Python string
which only takes an exception argument and constructs the associated object and stored as the ``associated value'' of the exception.
value by inspection of the (\UNIX{}) global variable \code{errno}. The
most general function is \code{PyErr_SetObject()}, which takes two Another useful function is \code{PyErr_SetFromErrno()}, which only
object arguments, the exception and its associated value. You don't takes an exception argument and constructs the associated value by
need to \code{Py_INCREF()} the objects passed to any of these inspection of the (\UNIX{}) global variable \code{errno}. The most
functions. general function is \code{PyErr_SetObject()}, which takes two object
arguments, the exception and its associated value. You don't need to
\code{Py_INCREF()} the objects passed to any of these functions.
You can test non-destructively whether an exception has been set with You can test non-destructively whether an exception has been set with
\code{PyErr_Occurred()} --- this returns the current exception object, \code{PyErr_Occurred()}. This returns the current exception object,
or \code{NULL} if no exception has occurred. Most code never needs to or \code{NULL} if no exception has occurred. You normally don't need
call \code{PyErr_Occurred()} to see whether an error occurred or not, to call \code{PyErr_Occurred()} to see whether an error occurred in a
but relies on error return values from the functions it calls instead. function call, since you should be able to tell from the return value.
When a function that calls another function detects that the called When a function \var{f} that calls another function var{g} detects
function fails, it should return an error value (e.g. \code{NULL} or that the latter fails, \var{f} should itself return an error value
\code{-1}). It shouldn't call one of the \code{PyErr_*} functions --- (e.g. \code{NULL} or \code{-1}). It should \emph{not} call one of the
one has already been called. The caller is then supposed to also \code{PyErr_*()} functions --- one has already been called by \var{g}.
return an error indication to {\em its} caller, again {\em without} \var{f}'s caller is then supposed to also return an error indication
calling \code{PyErr_*()}, and so on --- the most detailed cause of the to \emph{its} caller, again \emph{without} calling \code{PyErr_*()},
error was already reported by the function that first detected it. and so on --- the most detailed cause of the error was already
Once the error has reached Python's interpreter main loop, this aborts reported by the function that first detected it. Once the error
the currently executing Python code and tries to find an exception reaches the Python interpreter's main loop, this aborts the currently
handler specified by the Python programmer. executing Python code and tries to find an exception handler specified
by the Python programmer.
(There are situations where a module can actually give a more detailed (There are situations where a module can actually give a more detailed
error message by calling another \code{PyErr_*} function, and in such error message by calling another \code{PyErr_*()} function, and in
cases it is fine to do so. As a general rule, however, this is not such cases it is fine to do so. As a general rule, however, this is
necessary, and can cause information about the cause of the error to not necessary, and can cause information about the cause of the error
be lost: most operations can fail for a variety of reasons.) to be lost: most operations can fail for a variety of reasons.)
To ignore an exception set by a function call that failed, the exception To ignore an exception set by a function call that failed, the exception
condition must be cleared explicitly by calling \code{PyErr_Clear()}. condition must be cleared explicitly by calling \code{PyErr_Clear()}.
@ -216,7 +221,7 @@ want to pass the error on to the interpreter but wants to handle it
completely by itself (e.g. by trying something else or pretending completely by itself (e.g. by trying something else or pretending
nothing happened). nothing happened).
Note that a failing \code{malloc()} call must also be turned into an Note that a failing \code{malloc()} call must be turned into an
exception --- the direct caller of \code{malloc()} (or exception --- the direct caller of \code{malloc()} (or
\code{realloc()}) must call \code{PyErr_NoMemory()} and return a \code{realloc()}) must call \code{PyErr_NoMemory()} and return a
failure indicator itself. All the object-creating functions failure indicator itself. All the object-creating functions
@ -224,18 +229,18 @@ failure indicator itself. All the object-creating functions
\code{malloc()} directly this note is of importance. \code{malloc()} directly this note is of importance.
Also note that, with the important exception of Also note that, with the important exception of
\code{PyArg_ParseTuple()}, functions that return an integer status \code{PyArg_ParseTuple()} and friends, functions that return an
usually return \code{0} or a positive value for success and \code{-1} integer status usually return a positive value or zero for success and
for failure (like \UNIX{} system calls). \code{-1} for failure, like \UNIX{} system calls.
Finally, be careful about cleaning up garbage (making \code{Py_XDECREF()} Finally, be careful to clean up garbage (by making \code{Py_XDECREF()}
or \code{Py_DECREF()} calls for objects you have already created) when or \code{Py_DECREF()} calls for objects you have already created) when
you return an error! you return an error indicator!
The choice of which exception to raise is entirely yours. There are The choice of which exception to raise is entirely yours. There are
predeclared C objects corresponding to all built-in Python exceptions, predeclared C objects corresponding to all built-in Python exceptions,
e.g. \code{PyExc_ZeroDevisionError} which you can use directly. Of e.g. \code{PyExc_ZeroDevisionError} which you can use directly. Of
course, you should chose exceptions wisely --- don't use course, you should choose exceptions wisely --- don't use
\code{PyExc_TypeError} to mean that a file couldn't be opened (that \code{PyExc_TypeError} to mean that a file couldn't be opened (that
should probably be \code{PyExc_IOError}). If something's wrong with should probably be \code{PyExc_IOError}). If something's wrong with
the argument list, the \code{PyArg_ParseTuple()} function usually the argument list, the \code{PyArg_ParseTuple()} function usually
@ -253,25 +258,25 @@ beginning of your file, e.g.
and initialize it in your module's initialization function and initialize it in your module's initialization function
(\code{initspam()}) with a string object, e.g. (leaving out the error (\code{initspam()}) with a string object, e.g. (leaving out the error
checking for simplicity): checking for now):
\begin{verbatim} \begin{verbatim}
void void
initspam() initspam()
{ {
PyObject *m, *d; PyObject *m, *d;
m = Py_InitModule("spam", spam_methods); m = Py_InitModule("spam", SpamMethods);
d = PyModule_GetDict(m); d = PyModule_GetDict(m);
SpamError = PyString_FromString("spam.error"); SpamError = PyString_FromString("spam.error");
PyDict_SetItemString(d, "error", SpamError); PyDict_SetItemString(d, "error", SpamError);
} }
\end{verbatim} \end{verbatim}
Note that the Python name for the exception object is \code{spam.error} Note that the Python name for the exception object is
--- it is conventional for module and exception names to be spelled in \code{spam.error}. It is conventional for module and exception names
lower case. It is also conventional that the \emph{value} of the to be spelled in lower case. It is also conventional that the
exception object is the same as its name, e.g.\ the string \emph{value} of the exception object is the same as its name, e.g.\
\code{"spam.error"}. the string \code{"spam.error"}.
\section{Back to the Example} \section{Back to the Example}
@ -289,8 +294,8 @@ object pointers) if an error is detected in the argument list, relying
on the exception set by \code{PyArg_ParseTuple()}. Otherwise the on the exception set by \code{PyArg_ParseTuple()}. Otherwise the
string value of the argument has been copied to the local variable string value of the argument has been copied to the local variable
\code{command}. This is a pointer assignment and you are not supposed \code{command}. This is a pointer assignment and you are not supposed
to modify the string to which it points (so in ANSI C, the variable to modify the string to which it points (so in Standard C, the variable
\code{command} should properly be declared as \code{const char \code{command} should properly be declared as \samp{const char
*command}). *command}).
The next statement is a call to the \UNIX{} function \code{system()}, The next statement is a call to the \UNIX{} function \code{system()},
@ -300,9 +305,8 @@ passing it the string we just got from \code{PyArg_ParseTuple()}:
sts = system(command); sts = system(command);
\end{verbatim} \end{verbatim}
Our \code{spam.system()} function must return a value: the integer Our \code{spam.system()} function must return the value of \code{sys}
\code{sts} which contains the return value of the \UNIX{} as a Python object. This is done using the function
\code{system()} function. This is done using the function
\code{Py_BuildValue()}, which is something like the inverse of \code{Py_BuildValue()}, which is something like the inverse of
\code{PyArg_ParseTuple()}: it takes a format string and an arbitrary \code{PyArg_ParseTuple()}: it takes a format string and an arbitrary
number of C values, and returns a new Python object. More info on number of C values, and returns a new Python object. More info on
@ -326,7 +330,7 @@ returning \code{void}), the corresponding Python function must return
\code{Py_None} is the C name for the special Python object \code{Py_None} is the C name for the special Python object
\code{None}. It is a genuine Python object (not a \code{NULL} \code{None}. It is a genuine Python object (not a \code{NULL}
pointer, which means `error' in most contexts, as we have seen). pointer, which means ``error'' in most contexts, as we have seen).
\section{The Module's Method Table and Initialization Function} \section{The Module's Method Table and Initialization Function}
@ -336,7 +340,7 @@ programs. First, we need to list its name and address in a ``method
table'': table'':
\begin{verbatim} \begin{verbatim}
static PyMethodDef spam_methods[] = { static PyMethodDef SpamMethods[] = {
... ...
{"system", spam_system, 1}, {"system", spam_system, 1},
... ...
@ -357,7 +361,7 @@ item defined in the module file):
void void
initspam() initspam()
{ {
(void) Py_InitModule("spam", spam_methods); (void) Py_InitModule("spam", SpamMethods);
} }
\end{verbatim} \end{verbatim}
@ -375,11 +379,11 @@ so the caller doesn't need to check for errors.
\section{Compilation and Linkage} \section{Compilation and Linkage}
There are two more things to do before you can use your new extension There are two more things to do before you can use your new extension:
module: compiling and linking it with the Python system. If you use compiling and linking it with the Python system. If you use dynamic
dynamic loading, the details depend on the style of dynamic loading loading, the details depend on the style of dynamic loading your
your system uses; see the chapter on Dynamic Loading for more info system uses; see the chapter on Dynamic Loading for more info about
about this. this.
If you can't use dynamic loading, or if you want to make your module a If you can't use dynamic loading, or if you want to make your module a
permanent part of the Python interpreter, you will have to change the permanent part of the Python interpreter, you will have to change the
@ -411,7 +415,7 @@ be listed on the line in the \file{Setup} file as well, for instance:
So far we have concentrated on making C functions callable from So far we have concentrated on making C functions callable from
Python. The reverse is also useful: calling Python functions from C. Python. The reverse is also useful: calling Python functions from C.
This is especially the case for libraries that support so-called This is especially the case for libraries that support so-called
`callback' functions. If a C interface makes use of callbacks, the ``callback'' functions. If a C interface makes use of callbacks, the
equivalent Python often needs to provide a callback mechanism to the equivalent Python often needs to provide a callback mechanism to the
Python programmer; the implementation will require calling the Python Python programmer; the implementation will require calling the Python
callback functions from a C callback. Other uses are also imaginable. callback functions from a C callback. Other uses are also imaginable.
@ -476,7 +480,7 @@ parentheses. For example:
\code{PyEval_CallObject()} returns a Python object pointer: this is \code{PyEval_CallObject()} returns a Python object pointer: this is
the return value of the Python function. \code{PyEval_CallObject()} is the return value of the Python function. \code{PyEval_CallObject()} is
`reference-count-neutral' with respect to its arguments. In the ``reference-count-neutral'' with respect to its arguments. In the
example a new tuple was created to serve as the argument list, which example a new tuple was created to serve as the argument list, which
is \code{Py_DECREF()}-ed immediately after the call. is \code{Py_DECREF()}-ed immediately after the call.
@ -1134,7 +1138,7 @@ linked by the C compiler, global or static objects with constructors
cannot be used. All functions that will be called directly or cannot be used. All functions that will be called directly or
indirectly (i.e. via function pointers) by the Python interpreter will indirectly (i.e. via function pointers) by the Python interpreter will
have to be declared using \code{extern "C"}; this applies to all have to be declared using \code{extern "C"}; this applies to all
`methods' as well as to the module's initialization function. ``methods'' as well as to the module's initialization function.
It is unnecessary to enclose the Python header files in It is unnecessary to enclose the Python header files in
\code{extern "C" \{...\}} --- they use this form already if the symbol \code{extern "C" \{...\}} --- they use this form already if the symbol
\samp{__cplusplus} is defined (all recent C++ compilers define this \samp{__cplusplus} is defined (all recent C++ compilers define this
@ -1189,7 +1193,7 @@ libraries) in the module search path, and if one is found, it is
loaded into the executing binary and executed. Once loaded, the loaded into the executing binary and executed. Once loaded, the
module acts just like a built-in extension module. module acts just like a built-in extension module.
The advantages of dynamic loading are twofold: the `core' Python The advantages of dynamic loading are twofold: the ``core'' Python
binary gets smaller, and users can extend Python with their own binary gets smaller, and users can extend Python with their own
modules implemented in C without having to build and maintain their modules implemented in C without having to build and maintain their
own copy of the Python interpreter. There are also disadvantages: own copy of the Python interpreter. There are also disadvantages:
@ -1307,12 +1311,12 @@ On SGI IRIX 5, use
ld -shared spammodule.o -o spammodule.so ld -shared spammodule.o -o spammodule.so
\end{verbatim} \end{verbatim}
On other systems, consult the manual page for {\em ld}(1) to find what On other systems, consult the manual page for \code{ld}(1) to find what
flags, if any, must be used. flags, if any, must be used.
If your extension module uses system libraries that haven't already If your extension module uses system libraries that haven't already
been linked with Python (e.g. a windowing system), these must be been linked with Python (e.g. a windowing system), these must be
passed to the {\em ld} command as \samp{-l} options after the passed to the \code{ld} command as \samp{-l} options after the
\samp{.o} file. \samp{.o} file.
The resulting file \file{spammodule.so} must be copied into a directory The resulting file \file{spammodule.so} must be copied into a directory

View File

@ -20,11 +20,22 @@
\begin{abstract} \begin{abstract}
\noindent \noindent
This document describes how to write modules in C or \Cpp{} to extend the Python is an interpreted, object-oriented programming language. This
Python interpreter. It also describes how to use Python as an document describes how to write modules in C or \Cpp{} to extend the
`embedded' language, and how extension modules can be loaded Python interpreter with new modules. Those modules can define new
dynamically (at run time) into the interpreter, if the operating functions but also new object types and their methods. The document
system supports this feature. also describes how to embed the Python interpreter in another
application, for use as an extension language. Finally, it shows how
to compile and link extension modules so that they can be loaded
dynamically (at run time) into the interpreter, if the underlying
operating system supports this feature.
This document assumes basic knowledge about Python. For an informal
introduction to the language, see the Python Tutorial. The Python
Reference Manual gives a more formal definition of the language. The
Python Library Reference documents the existing object types,
functions and modules (both built-in and written in Python) that give
the language its wide application range.
\end{abstract} \end{abstract}
@ -45,46 +56,43 @@ system supports this feature.
\section{Introduction} \section{Introduction}
It is quite easy to add non-standard built-in modules to Python, if It is quite easy to add new built-in modules to Python, if you know
you know how to program in C. A built-in module known to the Python how to program in C. Such \dfn{extension modules} can do two things
programmer as \code{spam} is generally implemented by a file called that can't be done directly in Python: they can implement new built-in
\file{spammodule.c} (if the module name is very long, like object types, and they can call C library functions and system calls.
\samp{spammify}, you can drop the \samp{module}, leaving a file name
like \file{spammify.c}). The standard built-in modules also adhere to
this convention, and in fact some of them are excellent examples of
how to create an extension.
Extension modules can do two things that can't be done directly in
Python: they can implement new data types (which are different from
classes, by the way), and they can make system calls or call C library
functions.
To support extensions, the Python API (Application Programmers To support extensions, the Python API (Application Programmers
Interface) defines many functions, macros and variables that provide Interface) defines a set of functions, macros and variables that
access to almost every aspect of the Python run-time system. provide access to most aspects of the Python run-time system. The
Most of the Python API is imported by including the single header file Python API is incorporated in a C source file by including the header
\code{"Python.h"}. All user-visible symbols defined by including this \code{"Python.h"}.
file have a prefix of \samp{Py} or \samp{PY}, except those defined in
standard header files --- for convenience, and since they are needed by
the Python interpreter, \file{"Python.h"} includes a few standard
header files: \file{<stdio.h>}, \file{<string.h>}, \file{<errno.h>},
and \file{<stdlib.h>}. If the latter header file does not exist on
your system, it declares the functions \code{malloc()}, \code{free()}
and \code{realloc()} itself.
The compilation of an extension module depends on your system setup The compilation of an extension module depends on its intended use as
and the intended use of the module; details are given in a later well as on your system setup; details are given in a later section.
section.
Note: unless otherwise mentioned, all file references in this
document are relative to the Python toplevel directory
(the directory that contains the \file{configure} script).
\section{A Simple Example} \section{A Simple Example}
Let's create an extension module called \samp{spam}. Create a file Let's create an extension module called \samp{spam} (the favorite food
\samp{spammodule.c}. The first line of this file can be: of Monty Python fans...) and let's say we want to create a Python
interface to the C library function \code{system()}.\footnote{An
interface for this function already exists in the standard module
\code{os} --- it was chosen as a simple and straightfoward example.}
This function takes a null-terminated character string as argument and
returns an integer. We want this function to be callable from Python
as follows:
\begin{verbatim}
>>> import spam
>>> status = spam.system("ls -l")
\end{verbatim}
Begin by creating a file \samp{spammodule.c}. (In general, if a
module is called \samp{spam}, the C file containing its implementation
is called \file{spammodule.c}; if the module name is very long, like
\samp{spammify}, the module name can be just \file{spammify.c}.)
The first line of our file can be:
\begin{verbatim} \begin{verbatim}
#include "Python.h" #include "Python.h"
@ -93,21 +101,18 @@ Let's create an extension module called \samp{spam}. Create a file
which pulls in the Python API (you can add a comment describing the which pulls in the Python API (you can add a comment describing the
purpose of the module and a copyright notice if you like). purpose of the module and a copyright notice if you like).
Let's create a Python interface to the C library function All user-visible symbols defined by \code{"Python.h"} have a prefix of
\code{system()}.\footnote{An interface for this function already \samp{Py} or \samp{PY}, except those defined in standard header files.
exists in the \code{posix} module --- it was chosen as a simple and For convenience, and since they are used extensively by the Python
straightfoward example.} This function takes a zero-terminated interpreter, \code{"Python.h"} includes a few standard header files:
character string as argument and returns an integer. We will want \code{<stdio.h>}, \code{<string.h>}, \code{<errno.h>}, and
this function to be callable from Python as follows: \code{<stdlib.h>}. If the latter header file does not exist on your
system, it declares the functions \code{malloc()}, \code{free()} and
\begin{verbatim} \code{realloc()} directly.
>>> import spam
>>> status = spam.system("ls -l")
\end{verbatim}
The next thing we add to our module file is the C function that will The next thing we add to our module file is the C function that will
be called when the Python expression \samp{spam.system(\var{string})} be called when the Python expression \samp{spam.system(\var{string})}
is evaluated (well see shortly how it ends up being called): is evaluated (we'll see shortly how it ends up being called):
\begin{verbatim} \begin{verbatim}
static PyObject * static PyObject *
@ -125,35 +130,32 @@ is evaluated (well see shortly how it ends up being called):
\end{verbatim} \end{verbatim}
There is a straightforward translation from the argument list in There is a straightforward translation from the argument list in
Python (here the single expression \code{"ls -l"}) to the arguments Python (e.g.\ the single expression \code{"ls -l"}) to the arguments
that are passed to the C function. The C function always has two passed to the C function. The C function always has two arguments,
arguments, conventionally named \var{self} and \var{args}. conventionally named \var{self} and \var{args}.
The \var{self} argument is only used when the C function implements a The \var{self} argument is only used when the C function implements a
builtin method --- this will be discussed later. In the example, builtin method. This will be discussed later. In the example,
\var{self} will always be a \code{NULL} pointer, since we are defining \var{self} will always be a \code{NULL} pointer, since we are defining
a function, not a method. (This is done so that the interpreter a function, not a method. (This is done so that the interpreter
doesn't have to understand two different types of C functions.) doesn't have to understand two different types of C functions.)
The \var{args} argument will be a pointer to a Python tuple object The \var{args} argument will be a pointer to a Python tuple object
containing the arguments --- the length of the tuple will be the containing the arguments. Each item of the tuple corresponds to an
number of arguments. It is necessary to do full argument type argument in the call's argument list. The arguments are Python
checking in each call, since otherwise the Python user would be able objects -- in order to do anything with them in our C function we have
to cause the Python interpreter to crash (rather than raising an to convert them to C values. The function \code{PyArg_ParseTuple()}
exception) by passing invalid arguments to a function in an extension in the Python API checks the argument types and converts them to C
module. Because argument checking and converting arguments to C are values. It uses a template string to determine the required types of
such common tasks, there's a general function in the Python the arguments as well as the types of the C variables into which to
interpreter that combines them: \code{PyArg_ParseTuple()}. It uses a store the converted values. More about this later.
template string to determine the types of the Python argument and the
types of the C variables into which it should store the converted
values (more about this later).
\code{PyArg_ParseTuple()} returns nonzero if all arguments have the \code{PyArg_ParseTuple()} returns true (nonzero) if all arguments have
right type and its components have been stored in the variables whose the right type and its components have been stored in the variables
addresses are passed. It returns zero if an invalid argument was whose addresses are passed. It returns false (zero) if an invalid
passed. In the latter case it also raises an appropriate exception by argument list was passed. In the latter case it also raises an
so the calling function can return \code{NULL} immediately. Here's appropriate exception by so the calling function can return
why: \code{NULL} immediately (as we saw in the example).
\section{Intermezzo: Errors and Exceptions} \section{Intermezzo: Errors and Exceptions}
@ -161,53 +163,56 @@ why:
An important convention throughout the Python interpreter is the An important convention throughout the Python interpreter is the
following: when a function fails, it should set an exception condition following: when a function fails, it should set an exception condition
and return an error value (usually a \code{NULL} pointer). Exceptions and return an error value (usually a \code{NULL} pointer). Exceptions
are stored in a static global variable inside the interpreter; if are stored in a static global variable inside the interpreter; if this
this variable is \code{NULL} no exception has occurred. A second variable is \code{NULL} no exception has occurred. A second global
global variable stores the `associated value' of the exception variable stores the ``associated value'' of the exception (the second
--- the second argument to \code{raise}. A third variable contains argument to \code{raise}). A third variable contains the stack
the stack traceback in case the error originated in Python code. traceback in case the error originated in Python code. These three
These three variables are the C equivalents of the Python variables variables are the C equivalents of the Python variables
\code{sys.exc_type}, \code{sys.exc_value} and \code{sys.exc_traceback} \code{sys.exc_type}, \code{sys.exc_value} and \code{sys.exc_traceback}
--- see the section on module \code{sys} in the Library Reference (see the section on module \code{sys} in the Library Reference
Manual. It is important to know about them to understand how errors Manual). It is important to know about them to understand how errors
are passed around. are passed around.
The Python API defines a host of functions to set various types of The Python API defines a number of functions to set various types of
exceptions. The most common one is \code{PyErr_SetString()} --- its exceptions.
arguments are an exception object (e.g. \code{PyExc_RuntimeError} ---
actually it can be any object that is a legal exception indicator), The most common one is \code{PyErr_SetString()}. Its arguments are an
and a C string indicating the cause of the error (this is converted to exception object and a C string. The exception object is usually a
a string object and stored as the `associated value' of the predefined object like \code{PyExc_ZeroDivisionError}. The C string
exception). Another useful function is \code{PyErr_SetFromErrno()}, indicates the cause of the error and is converted to a Python string
which only takes an exception argument and constructs the associated object and stored as the ``associated value'' of the exception.
value by inspection of the (\UNIX{}) global variable \code{errno}. The
most general function is \code{PyErr_SetObject()}, which takes two Another useful function is \code{PyErr_SetFromErrno()}, which only
object arguments, the exception and its associated value. You don't takes an exception argument and constructs the associated value by
need to \code{Py_INCREF()} the objects passed to any of these inspection of the (\UNIX{}) global variable \code{errno}. The most
functions. general function is \code{PyErr_SetObject()}, which takes two object
arguments, the exception and its associated value. You don't need to
\code{Py_INCREF()} the objects passed to any of these functions.
You can test non-destructively whether an exception has been set with You can test non-destructively whether an exception has been set with
\code{PyErr_Occurred()} --- this returns the current exception object, \code{PyErr_Occurred()}. This returns the current exception object,
or \code{NULL} if no exception has occurred. Most code never needs to or \code{NULL} if no exception has occurred. You normally don't need
call \code{PyErr_Occurred()} to see whether an error occurred or not, to call \code{PyErr_Occurred()} to see whether an error occurred in a
but relies on error return values from the functions it calls instead. function call, since you should be able to tell from the return value.
When a function that calls another function detects that the called When a function \var{f} that calls another function var{g} detects
function fails, it should return an error value (e.g. \code{NULL} or that the latter fails, \var{f} should itself return an error value
\code{-1}). It shouldn't call one of the \code{PyErr_*} functions --- (e.g. \code{NULL} or \code{-1}). It should \emph{not} call one of the
one has already been called. The caller is then supposed to also \code{PyErr_*()} functions --- one has already been called by \var{g}.
return an error indication to {\em its} caller, again {\em without} \var{f}'s caller is then supposed to also return an error indication
calling \code{PyErr_*()}, and so on --- the most detailed cause of the to \emph{its} caller, again \emph{without} calling \code{PyErr_*()},
error was already reported by the function that first detected it. and so on --- the most detailed cause of the error was already
Once the error has reached Python's interpreter main loop, this aborts reported by the function that first detected it. Once the error
the currently executing Python code and tries to find an exception reaches the Python interpreter's main loop, this aborts the currently
handler specified by the Python programmer. executing Python code and tries to find an exception handler specified
by the Python programmer.
(There are situations where a module can actually give a more detailed (There are situations where a module can actually give a more detailed
error message by calling another \code{PyErr_*} function, and in such error message by calling another \code{PyErr_*()} function, and in
cases it is fine to do so. As a general rule, however, this is not such cases it is fine to do so. As a general rule, however, this is
necessary, and can cause information about the cause of the error to not necessary, and can cause information about the cause of the error
be lost: most operations can fail for a variety of reasons.) to be lost: most operations can fail for a variety of reasons.)
To ignore an exception set by a function call that failed, the exception To ignore an exception set by a function call that failed, the exception
condition must be cleared explicitly by calling \code{PyErr_Clear()}. condition must be cleared explicitly by calling \code{PyErr_Clear()}.
@ -216,7 +221,7 @@ want to pass the error on to the interpreter but wants to handle it
completely by itself (e.g. by trying something else or pretending completely by itself (e.g. by trying something else or pretending
nothing happened). nothing happened).
Note that a failing \code{malloc()} call must also be turned into an Note that a failing \code{malloc()} call must be turned into an
exception --- the direct caller of \code{malloc()} (or exception --- the direct caller of \code{malloc()} (or
\code{realloc()}) must call \code{PyErr_NoMemory()} and return a \code{realloc()}) must call \code{PyErr_NoMemory()} and return a
failure indicator itself. All the object-creating functions failure indicator itself. All the object-creating functions
@ -224,18 +229,18 @@ failure indicator itself. All the object-creating functions
\code{malloc()} directly this note is of importance. \code{malloc()} directly this note is of importance.
Also note that, with the important exception of Also note that, with the important exception of
\code{PyArg_ParseTuple()}, functions that return an integer status \code{PyArg_ParseTuple()} and friends, functions that return an
usually return \code{0} or a positive value for success and \code{-1} integer status usually return a positive value or zero for success and
for failure (like \UNIX{} system calls). \code{-1} for failure, like \UNIX{} system calls.
Finally, be careful about cleaning up garbage (making \code{Py_XDECREF()} Finally, be careful to clean up garbage (by making \code{Py_XDECREF()}
or \code{Py_DECREF()} calls for objects you have already created) when or \code{Py_DECREF()} calls for objects you have already created) when
you return an error! you return an error indicator!
The choice of which exception to raise is entirely yours. There are The choice of which exception to raise is entirely yours. There are
predeclared C objects corresponding to all built-in Python exceptions, predeclared C objects corresponding to all built-in Python exceptions,
e.g. \code{PyExc_ZeroDevisionError} which you can use directly. Of e.g. \code{PyExc_ZeroDevisionError} which you can use directly. Of
course, you should chose exceptions wisely --- don't use course, you should choose exceptions wisely --- don't use
\code{PyExc_TypeError} to mean that a file couldn't be opened (that \code{PyExc_TypeError} to mean that a file couldn't be opened (that
should probably be \code{PyExc_IOError}). If something's wrong with should probably be \code{PyExc_IOError}). If something's wrong with
the argument list, the \code{PyArg_ParseTuple()} function usually the argument list, the \code{PyArg_ParseTuple()} function usually
@ -253,25 +258,25 @@ beginning of your file, e.g.
and initialize it in your module's initialization function and initialize it in your module's initialization function
(\code{initspam()}) with a string object, e.g. (leaving out the error (\code{initspam()}) with a string object, e.g. (leaving out the error
checking for simplicity): checking for now):
\begin{verbatim} \begin{verbatim}
void void
initspam() initspam()
{ {
PyObject *m, *d; PyObject *m, *d;
m = Py_InitModule("spam", spam_methods); m = Py_InitModule("spam", SpamMethods);
d = PyModule_GetDict(m); d = PyModule_GetDict(m);
SpamError = PyString_FromString("spam.error"); SpamError = PyString_FromString("spam.error");
PyDict_SetItemString(d, "error", SpamError); PyDict_SetItemString(d, "error", SpamError);
} }
\end{verbatim} \end{verbatim}
Note that the Python name for the exception object is \code{spam.error} Note that the Python name for the exception object is
--- it is conventional for module and exception names to be spelled in \code{spam.error}. It is conventional for module and exception names
lower case. It is also conventional that the \emph{value} of the to be spelled in lower case. It is also conventional that the
exception object is the same as its name, e.g.\ the string \emph{value} of the exception object is the same as its name, e.g.\
\code{"spam.error"}. the string \code{"spam.error"}.
\section{Back to the Example} \section{Back to the Example}
@ -289,8 +294,8 @@ object pointers) if an error is detected in the argument list, relying
on the exception set by \code{PyArg_ParseTuple()}. Otherwise the on the exception set by \code{PyArg_ParseTuple()}. Otherwise the
string value of the argument has been copied to the local variable string value of the argument has been copied to the local variable
\code{command}. This is a pointer assignment and you are not supposed \code{command}. This is a pointer assignment and you are not supposed
to modify the string to which it points (so in ANSI C, the variable to modify the string to which it points (so in Standard C, the variable
\code{command} should properly be declared as \code{const char \code{command} should properly be declared as \samp{const char
*command}). *command}).
The next statement is a call to the \UNIX{} function \code{system()}, The next statement is a call to the \UNIX{} function \code{system()},
@ -300,9 +305,8 @@ passing it the string we just got from \code{PyArg_ParseTuple()}:
sts = system(command); sts = system(command);
\end{verbatim} \end{verbatim}
Our \code{spam.system()} function must return a value: the integer Our \code{spam.system()} function must return the value of \code{sys}
\code{sts} which contains the return value of the \UNIX{} as a Python object. This is done using the function
\code{system()} function. This is done using the function
\code{Py_BuildValue()}, which is something like the inverse of \code{Py_BuildValue()}, which is something like the inverse of
\code{PyArg_ParseTuple()}: it takes a format string and an arbitrary \code{PyArg_ParseTuple()}: it takes a format string and an arbitrary
number of C values, and returns a new Python object. More info on number of C values, and returns a new Python object. More info on
@ -326,7 +330,7 @@ returning \code{void}), the corresponding Python function must return
\code{Py_None} is the C name for the special Python object \code{Py_None} is the C name for the special Python object
\code{None}. It is a genuine Python object (not a \code{NULL} \code{None}. It is a genuine Python object (not a \code{NULL}
pointer, which means `error' in most contexts, as we have seen). pointer, which means ``error'' in most contexts, as we have seen).
\section{The Module's Method Table and Initialization Function} \section{The Module's Method Table and Initialization Function}
@ -336,7 +340,7 @@ programs. First, we need to list its name and address in a ``method
table'': table'':
\begin{verbatim} \begin{verbatim}
static PyMethodDef spam_methods[] = { static PyMethodDef SpamMethods[] = {
... ...
{"system", spam_system, 1}, {"system", spam_system, 1},
... ...
@ -357,7 +361,7 @@ item defined in the module file):
void void
initspam() initspam()
{ {
(void) Py_InitModule("spam", spam_methods); (void) Py_InitModule("spam", SpamMethods);
} }
\end{verbatim} \end{verbatim}
@ -375,11 +379,11 @@ so the caller doesn't need to check for errors.
\section{Compilation and Linkage} \section{Compilation and Linkage}
There are two more things to do before you can use your new extension There are two more things to do before you can use your new extension:
module: compiling and linking it with the Python system. If you use compiling and linking it with the Python system. If you use dynamic
dynamic loading, the details depend on the style of dynamic loading loading, the details depend on the style of dynamic loading your
your system uses; see the chapter on Dynamic Loading for more info system uses; see the chapter on Dynamic Loading for more info about
about this. this.
If you can't use dynamic loading, or if you want to make your module a If you can't use dynamic loading, or if you want to make your module a
permanent part of the Python interpreter, you will have to change the permanent part of the Python interpreter, you will have to change the
@ -411,7 +415,7 @@ be listed on the line in the \file{Setup} file as well, for instance:
So far we have concentrated on making C functions callable from So far we have concentrated on making C functions callable from
Python. The reverse is also useful: calling Python functions from C. Python. The reverse is also useful: calling Python functions from C.
This is especially the case for libraries that support so-called This is especially the case for libraries that support so-called
`callback' functions. If a C interface makes use of callbacks, the ``callback'' functions. If a C interface makes use of callbacks, the
equivalent Python often needs to provide a callback mechanism to the equivalent Python often needs to provide a callback mechanism to the
Python programmer; the implementation will require calling the Python Python programmer; the implementation will require calling the Python
callback functions from a C callback. Other uses are also imaginable. callback functions from a C callback. Other uses are also imaginable.
@ -476,7 +480,7 @@ parentheses. For example:
\code{PyEval_CallObject()} returns a Python object pointer: this is \code{PyEval_CallObject()} returns a Python object pointer: this is
the return value of the Python function. \code{PyEval_CallObject()} is the return value of the Python function. \code{PyEval_CallObject()} is
`reference-count-neutral' with respect to its arguments. In the ``reference-count-neutral'' with respect to its arguments. In the
example a new tuple was created to serve as the argument list, which example a new tuple was created to serve as the argument list, which
is \code{Py_DECREF()}-ed immediately after the call. is \code{Py_DECREF()}-ed immediately after the call.
@ -1134,7 +1138,7 @@ linked by the C compiler, global or static objects with constructors
cannot be used. All functions that will be called directly or cannot be used. All functions that will be called directly or
indirectly (i.e. via function pointers) by the Python interpreter will indirectly (i.e. via function pointers) by the Python interpreter will
have to be declared using \code{extern "C"}; this applies to all have to be declared using \code{extern "C"}; this applies to all
`methods' as well as to the module's initialization function. ``methods'' as well as to the module's initialization function.
It is unnecessary to enclose the Python header files in It is unnecessary to enclose the Python header files in
\code{extern "C" \{...\}} --- they use this form already if the symbol \code{extern "C" \{...\}} --- they use this form already if the symbol
\samp{__cplusplus} is defined (all recent C++ compilers define this \samp{__cplusplus} is defined (all recent C++ compilers define this
@ -1189,7 +1193,7 @@ libraries) in the module search path, and if one is found, it is
loaded into the executing binary and executed. Once loaded, the loaded into the executing binary and executed. Once loaded, the
module acts just like a built-in extension module. module acts just like a built-in extension module.
The advantages of dynamic loading are twofold: the `core' Python The advantages of dynamic loading are twofold: the ``core'' Python
binary gets smaller, and users can extend Python with their own binary gets smaller, and users can extend Python with their own
modules implemented in C without having to build and maintain their modules implemented in C without having to build and maintain their
own copy of the Python interpreter. There are also disadvantages: own copy of the Python interpreter. There are also disadvantages:
@ -1307,12 +1311,12 @@ On SGI IRIX 5, use
ld -shared spammodule.o -o spammodule.so ld -shared spammodule.o -o spammodule.so
\end{verbatim} \end{verbatim}
On other systems, consult the manual page for {\em ld}(1) to find what On other systems, consult the manual page for \code{ld}(1) to find what
flags, if any, must be used. flags, if any, must be used.
If your extension module uses system libraries that haven't already If your extension module uses system libraries that haven't already
been linked with Python (e.g. a windowing system), these must be been linked with Python (e.g. a windowing system), these must be
passed to the {\em ld} command as \samp{-l} options after the passed to the \code{ld} command as \samp{-l} options after the
\samp{.o} file. \samp{.o} file.
The resulting file \file{spammodule.so} must be copied into a directory The resulting file \file{spammodule.so} must be copied into a directory