1766 lines
59 KiB
TeX
1766 lines
59 KiB
TeX
\chapter{Defining New Types
|
|
\label{defining-new-types}}
|
|
\sectionauthor{Michael Hudson}{mwh@python.net}
|
|
\sectionauthor{Dave Kuhlman}{dkuhlman@rexx.com}
|
|
\sectionauthor{Jim Fulton}{jim@zope.com}
|
|
|
|
As mentioned in the last chapter, Python allows the writer of an
|
|
extension module to define new types that can be manipulated from
|
|
Python code, much like strings and lists in core Python.
|
|
|
|
This is not hard; the code for all extension types follows a pattern,
|
|
but there are some details that you need to understand before you can
|
|
get started.
|
|
|
|
\begin{notice}
|
|
The way new types are defined changed dramatically (and for the
|
|
better) in Python 2.2. This document documents how to define new
|
|
types for Python 2.2 and later. If you need to support older
|
|
versions of Python, you will need to refer to
|
|
\ulink{older versions of this documentation}
|
|
{http://www.python.org/doc/versions/}.
|
|
\end{notice}
|
|
|
|
\section{The Basics
|
|
\label{dnt-basics}}
|
|
|
|
The Python runtime sees all Python objects as variables of type
|
|
\ctype{PyObject*}. A \ctype{PyObject} is not a very magnificent
|
|
object - it just contains the refcount and a pointer to the object's
|
|
``type object''. This is where the action is; the type object
|
|
determines which (C) functions get called when, for instance, an
|
|
attribute gets looked up on an object or it is multiplied by another
|
|
object. These C functions are called ``type methods'' to distinguish
|
|
them from things like \code{[].append} (which we call ``object
|
|
methods'').
|
|
|
|
So, if you want to define a new object type, you need to create a new
|
|
type object.
|
|
|
|
This sort of thing can only be explained by example, so here's a
|
|
minimal, but complete, module that defines a new type:
|
|
|
|
\verbatiminput{noddy.c}
|
|
|
|
Now that's quite a bit to take in at once, but hopefully bits will
|
|
seem familiar from the last chapter.
|
|
|
|
The first bit that will be new is:
|
|
|
|
\begin{verbatim}
|
|
typedef struct {
|
|
PyObject_HEAD
|
|
} noddy_NoddyObject;
|
|
\end{verbatim}
|
|
|
|
This is what a Noddy object will contain---in this case, nothing more
|
|
than every Python object contains, namely a refcount and a pointer to a type
|
|
object. These are the fields the \code{PyObject_HEAD} macro brings
|
|
in. The reason for the macro is to standardize the layout and to
|
|
enable special debugging fields in debug builds. Note that there is
|
|
no semicolon after the \code{PyObject_HEAD} macro; one is included in
|
|
the macro definition. Be wary of adding one by accident; it's easy to
|
|
do from habit, and your compiler might not complain, but someone
|
|
else's probably will! (On Windows, MSVC is known to call this an
|
|
error and refuse to compile the code.)
|
|
|
|
For contrast, let's take a look at the corresponding definition for
|
|
standard Python integers:
|
|
|
|
\begin{verbatim}
|
|
typedef struct {
|
|
PyObject_HEAD
|
|
long ob_ival;
|
|
} PyIntObject;
|
|
\end{verbatim}
|
|
|
|
Moving on, we come to the crunch --- the type object.
|
|
|
|
\begin{verbatim}
|
|
static PyTypeObject noddy_NoddyType = {
|
|
PyObject_HEAD_INIT(NULL)
|
|
0, /*ob_size*/
|
|
"noddy.Noddy", /*tp_name*/
|
|
sizeof(noddy_NoddyObject), /*tp_basicsize*/
|
|
0, /*tp_itemsize*/
|
|
0, /*tp_dealloc*/
|
|
0, /*tp_print*/
|
|
0, /*tp_getattr*/
|
|
0, /*tp_setattr*/
|
|
0, /*tp_compare*/
|
|
0, /*tp_repr*/
|
|
0, /*tp_as_number*/
|
|
0, /*tp_as_sequence*/
|
|
0, /*tp_as_mapping*/
|
|
0, /*tp_hash */
|
|
0, /*tp_call*/
|
|
0, /*tp_str*/
|
|
0, /*tp_getattro*/
|
|
0, /*tp_setattro*/
|
|
0, /*tp_as_buffer*/
|
|
Py_TPFLAGS_DEFAULT, /*tp_flags*/
|
|
"Noddy objects", /* tp_doc */
|
|
};
|
|
\end{verbatim}
|
|
|
|
Now if you go and look up the definition of \ctype{PyTypeObject} in
|
|
\file{object.h} you'll see that it has many more fields that the
|
|
definition above. The remaining fields will be filled with zeros by
|
|
the C compiler, and it's common practice to not specify them
|
|
explicitly unless you need them.
|
|
|
|
This is so important that we're going to pick the top of it apart still
|
|
further:
|
|
|
|
\begin{verbatim}
|
|
PyObject_HEAD_INIT(NULL)
|
|
\end{verbatim}
|
|
|
|
This line is a bit of a wart; what we'd like to write is:
|
|
|
|
\begin{verbatim}
|
|
PyObject_HEAD_INIT(&PyType_Type)
|
|
\end{verbatim}
|
|
|
|
as the type of a type object is ``type'', but this isn't strictly
|
|
conforming C and some compilers complain. Fortunately, this member
|
|
will be filled in for us by \cfunction{PyType_Ready()}.
|
|
|
|
\begin{verbatim}
|
|
0, /* ob_size */
|
|
\end{verbatim}
|
|
|
|
The \member{ob_size} field of the header is not used; its presence in
|
|
the type structure is a historical artifact that is maintained for
|
|
binary compatibility with extension modules compiled for older
|
|
versions of Python. Always set this field to zero.
|
|
|
|
\begin{verbatim}
|
|
"noddy.Noddy", /* tp_name */
|
|
\end{verbatim}
|
|
|
|
The name of our type. This will appear in the default textual
|
|
representation of our objects and in some error messages, for example:
|
|
|
|
\begin{verbatim}
|
|
>>> "" + noddy.new_noddy()
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in ?
|
|
TypeError: cannot add type "noddy.Noddy" to string
|
|
\end{verbatim}
|
|
|
|
Note that the name is a dotted name that includes both the module name
|
|
and the name of the type within the module. The module in this case is
|
|
\module{noddy} and the type is \class{Noddy}, so we set the type name
|
|
to \class{noddy.Noddy}.
|
|
|
|
\begin{verbatim}
|
|
sizeof(noddy_NoddyObject), /* tp_basicsize */
|
|
\end{verbatim}
|
|
|
|
This is so that Python knows how much memory to allocate when you call
|
|
\cfunction{PyObject_New()}.
|
|
|
|
\note{If you want your type to be subclassable from Python, and your
|
|
type has the same \member{tp_basicsize} as its base type, you may
|
|
have problems with multiple inheritance. A Python subclass of your
|
|
type will have to list your type first in its \member{__bases__}, or
|
|
else it will not be able to call your type's \method{__new__} method
|
|
without getting an error. You can avoid this problem by ensuring
|
|
that your type has a larger value for \member{tp_basicsize} than
|
|
its base type does. Most of the time, this will be true anyway,
|
|
because either your base type will be \class{object}, or else you will
|
|
be adding data members to your base type, and therefore increasing its
|
|
size.}
|
|
|
|
\begin{verbatim}
|
|
0, /* tp_itemsize */
|
|
\end{verbatim}
|
|
|
|
This has to do with variable length objects like lists and strings.
|
|
Ignore this for now.
|
|
|
|
Skipping a number of type methods that we don't provide, we set the
|
|
class flags to \constant{Py_TPFLAGS_DEFAULT}.
|
|
|
|
\begin{verbatim}
|
|
Py_TPFLAGS_DEFAULT, /*tp_flags*/
|
|
\end{verbatim}
|
|
|
|
All types should include this constant in their flags. It enables all
|
|
of the members defined by the current version of Python.
|
|
|
|
We provide a doc string for the type in \member{tp_doc}.
|
|
|
|
\begin{verbatim}
|
|
"Noddy objects", /* tp_doc */
|
|
\end{verbatim}
|
|
|
|
Now we get into the type methods, the things that make your objects
|
|
different from the others. We aren't going to implement any of these
|
|
in this version of the module. We'll expand this example later to
|
|
have more interesting behavior.
|
|
|
|
For now, all we want to be able to do is to create new \class{Noddy}
|
|
objects. To enable object creation, we have to provide a
|
|
\member{tp_new} implementation. In this case, we can just use the
|
|
default implementation provided by the API function
|
|
\cfunction{PyType_GenericNew()}. We'd like to just assign this to the
|
|
\member{tp_new} slot, but we can't, for portability sake, On some
|
|
platforms or compilers, we can't statically initialize a structure
|
|
member with a function defined in another C module, so, instead, we'll
|
|
assign the \member{tp_new} slot in the module initialization function
|
|
just before calling \cfunction{PyType_Ready()}:
|
|
|
|
\begin{verbatim}
|
|
noddy_NoddyType.tp_new = PyType_GenericNew;
|
|
if (PyType_Ready(&noddy_NoddyType) < 0)
|
|
return;
|
|
\end{verbatim}
|
|
|
|
All the other type methods are \NULL, so we'll go over them later
|
|
--- that's for a later section!
|
|
|
|
Everything else in the file should be familiar, except for some code
|
|
in \cfunction{initnoddy()}:
|
|
|
|
\begin{verbatim}
|
|
if (PyType_Ready(&noddy_NoddyType) < 0)
|
|
return;
|
|
\end{verbatim}
|
|
|
|
This initializes the \class{Noddy} type, filing in a number of
|
|
members, including \member{ob_type} that we initially set to \NULL.
|
|
|
|
\begin{verbatim}
|
|
PyModule_AddObject(m, "Noddy", (PyObject *)&noddy_NoddyType);
|
|
\end{verbatim}
|
|
|
|
This adds the type to the module dictionary. This allows us to create
|
|
\class{Noddy} instances by calling the \class{Noddy} class:
|
|
|
|
\begin{verbatim}
|
|
>>> import noddy
|
|
>>> mynoddy = noddy.Noddy()
|
|
\end{verbatim}
|
|
|
|
That's it! All that remains is to build it; put the above code in a
|
|
file called \file{noddy.c} and
|
|
|
|
\begin{verbatim}
|
|
from distutils.core import setup, Extension
|
|
setup(name="noddy", version="1.0",
|
|
ext_modules=[Extension("noddy", ["noddy.c"])])
|
|
\end{verbatim}
|
|
|
|
in a file called \file{setup.py}; then typing
|
|
|
|
\begin{verbatim}
|
|
$ python setup.py build
|
|
\end{verbatim} %$ <-- bow to font-lock ;-(
|
|
|
|
at a shell should produce a file \file{noddy.so} in a subdirectory;
|
|
move to that directory and fire up Python --- you should be able to
|
|
\code{import noddy} and play around with Noddy objects.
|
|
|
|
That wasn't so hard, was it?
|
|
|
|
Of course, the current Noddy type is pretty uninteresting. It has no
|
|
data and doesn't do anything. It can't even be subclassed.
|
|
|
|
\subsection{Adding data and methods to the Basic example}
|
|
|
|
Let's expend the basic example to add some data and methods. Let's
|
|
also make the type usable as a base class. We'll create
|
|
a new module, \module{noddy2} that adds these capabilities:
|
|
|
|
\verbatiminput{noddy2.c}
|
|
|
|
This version of the module has a number of changes.
|
|
|
|
We've added an extra include:
|
|
|
|
\begin{verbatim}
|
|
#include "structmember.h"
|
|
\end{verbatim}
|
|
|
|
This include provides declarations that we use to handle attributes,
|
|
as described a bit later.
|
|
|
|
The name of the \class{Noddy} object structure has been shortened to
|
|
\class{Noddy}. The type object name has been shortened to
|
|
\class{NoddyType}.
|
|
|
|
The \class{Noddy} type now has three data attributes, \var{first},
|
|
\var{last}, and \var{number}. The \var{first} and \var{last}
|
|
variables are Python strings containing first and last names. The
|
|
\var{number} attribute is an integer.
|
|
|
|
The object structure is updated accordingly:
|
|
|
|
\begin{verbatim}
|
|
typedef struct {
|
|
PyObject_HEAD
|
|
PyObject *first;
|
|
PyObject *last;
|
|
int number;
|
|
} Noddy;
|
|
\end{verbatim}
|
|
|
|
Because we now have data to manage, we have to be more careful about
|
|
object allocation and deallocation. At a minimum, we need a
|
|
deallocation method:
|
|
|
|
\begin{verbatim}
|
|
static void
|
|
Noddy_dealloc(Noddy* self)
|
|
{
|
|
Py_XDECREF(self->first);
|
|
Py_XDECREF(self->last);
|
|
self->ob_type->tp_free((PyObject*)self);
|
|
}
|
|
\end{verbatim}
|
|
|
|
which is assigned to the \member{tp_dealloc} member:
|
|
|
|
\begin{verbatim}
|
|
(destructor)Noddy_dealloc, /*tp_dealloc*/
|
|
\end{verbatim}
|
|
|
|
This method decrements the reference counts of the two Python
|
|
attributes. We use \cfunction{Py_XDECREF()} here because the
|
|
\member{first} and \member{last} members could be \NULL. It then
|
|
calls the \member{tp_free} member of the object's type to free the
|
|
object's memory. Note that the object's type might not be
|
|
\class{NoddyType}, because the object may be an instance of a
|
|
subclass.
|
|
|
|
We want to make sure that the first and last names are initialized to
|
|
empty strings, so we provide a new method:
|
|
|
|
\begin{verbatim}
|
|
static PyObject *
|
|
Noddy_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
|
|
{
|
|
Noddy *self;
|
|
|
|
self = (Noddy *)type->tp_alloc(type, 0);
|
|
if (self != NULL) {
|
|
self->first = PyString_FromString("");
|
|
if (self->first == NULL)
|
|
{
|
|
Py_DECREF(self);
|
|
return NULL;
|
|
}
|
|
|
|
self->last = PyString_FromString("");
|
|
if (self->last == NULL)
|
|
{
|
|
Py_DECREF(self);
|
|
return NULL;
|
|
}
|
|
|
|
self->number = 0;
|
|
}
|
|
|
|
return (PyObject *)self;
|
|
}
|
|
\end{verbatim}
|
|
|
|
and install it in the \member{tp_new} member:
|
|
|
|
\begin{verbatim}
|
|
Noddy_new, /* tp_new */
|
|
\end{verbatim}
|
|
|
|
The new member is responsible for creating (as opposed to
|
|
initializing) objects of the type. It is exposed in Python as the
|
|
\method{__new__()} method. See the paper titled ``Unifying types and
|
|
classes in Python'' for a detailed discussion of the \method{__new__()}
|
|
method. One reason to implement a new method is to assure the initial
|
|
values of instance variables. In this case, we use the new method to
|
|
make sure that the initial values of the members \member{first} and
|
|
\member{last} are not \NULL. If we didn't care whether the initial
|
|
values were \NULL, we could have used \cfunction{PyType_GenericNew()} as
|
|
our new method, as we did before. \cfunction{PyType_GenericNew()}
|
|
initializes all of the instance variable members to \NULL.
|
|
|
|
The new method is a static method that is passed the type being
|
|
instantiated and any arguments passed when the type was called,
|
|
and that returns the new object created. New methods always accept
|
|
positional and keyword arguments, but they often ignore the arguments,
|
|
leaving the argument handling to initializer methods. Note that if the
|
|
type supports subclassing, the type passed may not be the type being
|
|
defined. The new method calls the tp_alloc slot to allocate memory.
|
|
We don't fill the \member{tp_alloc} slot ourselves. Rather
|
|
\cfunction{PyType_Ready()} fills it for us by inheriting it from our
|
|
base class, which is \class{object} by default. Most types use the
|
|
default allocation.
|
|
|
|
\note{If you are creating a co-operative \member{tp_new} (one that
|
|
calls a base type's \member{tp_new} or \method{__new__}), you
|
|
must \emph{not} try to determine what method to call using
|
|
method resolution order at runtime. Always statically determine
|
|
what type you are going to call, and call its \member{tp_new}
|
|
directly, or via \code{type->tp_base->tp_new}. If you do
|
|
not do this, Python subclasses of your type that also inherit
|
|
from other Python-defined classes may not work correctly.
|
|
(Specifically, you may not be able to create instances of
|
|
such subclasses without getting a \exception{TypeError}.)}
|
|
|
|
We provide an initialization function:
|
|
|
|
\begin{verbatim}
|
|
static int
|
|
Noddy_init(Noddy *self, PyObject *args, PyObject *kwds)
|
|
{
|
|
PyObject *first=NULL, *last=NULL, *tmp;
|
|
|
|
static char *kwlist[] = {"first", "last", "number", NULL};
|
|
|
|
if (! PyArg_ParseTupleAndKeywords(args, kwds, "|OOi", kwlist,
|
|
&first, &last,
|
|
&self->number))
|
|
return -1;
|
|
|
|
if (first) {
|
|
tmp = self->first;
|
|
Py_INCREF(first);
|
|
self->first = first;
|
|
Py_XDECREF(tmp);
|
|
}
|
|
|
|
if (last) {
|
|
tmp = self->last;
|
|
Py_INCREF(last);
|
|
self->last = last;
|
|
Py_XDECREF(tmp);
|
|
}
|
|
|
|
return 0;
|
|
}
|
|
\end{verbatim}
|
|
|
|
by filling the \member{tp_init} slot.
|
|
|
|
\begin{verbatim}
|
|
(initproc)Noddy_init, /* tp_init */
|
|
\end{verbatim}
|
|
|
|
The \member{tp_init} slot is exposed in Python as the
|
|
\method{__init__()} method. It is used to initialize an object after
|
|
it's created. Unlike the new method, we can't guarantee that the
|
|
initializer is called. The initializer isn't called when unpickling
|
|
objects and it can be overridden. Our initializer accepts arguments
|
|
to provide initial values for our instance. Initializers always accept
|
|
positional and keyword arguments.
|
|
|
|
Initializers can be called multiple times. Anyone can call the
|
|
\method{__init__()} method on our objects. For this reason, we have
|
|
to be extra careful when assigning the new values. We might be
|
|
tempted, for example to assign the \member{first} member like this:
|
|
|
|
\begin{verbatim}
|
|
if (first) {
|
|
Py_XDECREF(self->first);
|
|
Py_INCREF(first);
|
|
self->first = first;
|
|
}
|
|
\end{verbatim}
|
|
|
|
But this would be risky. Our type doesn't restrict the type of the
|
|
\member{first} member, so it could be any kind of object. It could
|
|
have a destructor that causes code to be executed that tries to
|
|
access the \member{first} member. To be paranoid and protect
|
|
ourselves against this possibility, we almost always reassign members
|
|
before decrementing their reference counts. When don't we have to do
|
|
this?
|
|
\begin{itemize}
|
|
\item when we absolutely know that the reference count is greater than
|
|
1
|
|
\item when we know that deallocation of the object\footnote{This is
|
|
true when we know that the object is a basic type, like a string or
|
|
a float.} will not cause any
|
|
calls back into our type's code
|
|
\item when decrementing a reference count in a \member{tp_dealloc}
|
|
handler when garbage-collections is not supported\footnote{We relied
|
|
on this in the \member{tp_dealloc} handler in this example, because
|
|
our type doesn't support garbage collection. Even if a type supports
|
|
garbage collection, there are calls that can be made to ``untrack''
|
|
the object from garbage collection, however, these calls are
|
|
advanced and not covered here.}
|
|
\end{itemize}
|
|
|
|
|
|
We want to want to expose our instance variables as attributes. There
|
|
are a number of ways to do that. The simplest way is to define member
|
|
definitions:
|
|
|
|
\begin{verbatim}
|
|
static PyMemberDef Noddy_members[] = {
|
|
{"first", T_OBJECT_EX, offsetof(Noddy, first), 0,
|
|
"first name"},
|
|
{"last", T_OBJECT_EX, offsetof(Noddy, last), 0,
|
|
"last name"},
|
|
{"number", T_INT, offsetof(Noddy, number), 0,
|
|
"noddy number"},
|
|
{NULL} /* Sentinel */
|
|
};
|
|
\end{verbatim}
|
|
|
|
and put the definitions in the \member{tp_members} slot:
|
|
|
|
\begin{verbatim}
|
|
Noddy_members, /* tp_members */
|
|
\end{verbatim}
|
|
|
|
Each member definition has a member name, type, offset, access flags
|
|
and documentation string. See the ``Generic Attribute Management''
|
|
section below for details.
|
|
|
|
A disadvantage of this approach is that it doesn't provide a way to
|
|
restrict the types of objects that can be assigned to the Python
|
|
attributes. We expect the first and last names to be strings, but any
|
|
Python objects can be assigned. Further, the attributes can be
|
|
deleted, setting the C pointers to \NULL. Even though we can make
|
|
sure the members are initialized to non-\NULL{} values, the members can
|
|
be set to \NULL{} if the attributes are deleted.
|
|
|
|
We define a single method, \method{name}, that outputs the objects
|
|
name as the concatenation of the first and last names.
|
|
|
|
\begin{verbatim}
|
|
static PyObject *
|
|
Noddy_name(Noddy* self)
|
|
{
|
|
static PyObject *format = NULL;
|
|
PyObject *args, *result;
|
|
|
|
if (format == NULL) {
|
|
format = PyString_FromString("%s %s");
|
|
if (format == NULL)
|
|
return NULL;
|
|
}
|
|
|
|
if (self->first == NULL) {
|
|
PyErr_SetString(PyExc_AttributeError, "first");
|
|
return NULL;
|
|
}
|
|
|
|
if (self->last == NULL) {
|
|
PyErr_SetString(PyExc_AttributeError, "last");
|
|
return NULL;
|
|
}
|
|
|
|
args = Py_BuildValue("OO", self->first, self->last);
|
|
if (args == NULL)
|
|
return NULL;
|
|
|
|
result = PyString_Format(format, args);
|
|
Py_DECREF(args);
|
|
|
|
return result;
|
|
}
|
|
\end{verbatim}
|
|
|
|
The method is implemented as a C function that takes a \class{Noddy} (or
|
|
\class{Noddy} subclass) instance as the first argument. Methods
|
|
always take an instance as the first argument. Methods often take
|
|
positional and keyword arguments as well, but in this cased we don't
|
|
take any and don't need to accept a positional argument tuple or
|
|
keyword argument dictionary. This method is equivalent to the Python
|
|
method:
|
|
|
|
\begin{verbatim}
|
|
def name(self):
|
|
return "%s %s" % (self.first, self.last)
|
|
\end{verbatim}
|
|
|
|
Note that we have to check for the possibility that our \member{first}
|
|
and \member{last} members are \NULL. This is because they can be
|
|
deleted, in which case they are set to \NULL. It would be better to
|
|
prevent deletion of these attributes and to restrict the attribute
|
|
values to be strings. We'll see how to do that in the next section.
|
|
|
|
Now that we've defined the method, we need to create an array of
|
|
method definitions:
|
|
|
|
\begin{verbatim}
|
|
static PyMethodDef Noddy_methods[] = {
|
|
{"name", (PyCFunction)Noddy_name, METH_NOARGS,
|
|
"Return the name, combining the first and last name"
|
|
},
|
|
{NULL} /* Sentinel */
|
|
};
|
|
\end{verbatim}
|
|
|
|
and assign them to the \member{tp_methods} slot:
|
|
|
|
\begin{verbatim}
|
|
Noddy_methods, /* tp_methods */
|
|
\end{verbatim}
|
|
|
|
Note that we used the \constant{METH_NOARGS} flag to indicate that the
|
|
method is passed no arguments.
|
|
|
|
Finally, we'll make our type usable as a base class. We've written
|
|
our methods carefully so far so that they don't make any assumptions
|
|
about the type of the object being created or used, so all we need to
|
|
do is to add the \constant{Py_TPFLAGS_BASETYPE} to our class flag
|
|
definition:
|
|
|
|
\begin{verbatim}
|
|
Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /*tp_flags*/
|
|
\end{verbatim}
|
|
|
|
We rename \cfunction{initnoddy()} to \cfunction{initnoddy2()}
|
|
and update the module name passed to \cfunction{Py_InitModule3()}.
|
|
|
|
Finally, we update our \file{setup.py} file to build the new module:
|
|
|
|
\begin{verbatim}
|
|
from distutils.core import setup, Extension
|
|
setup(name="noddy", version="1.0",
|
|
ext_modules=[
|
|
Extension("noddy", ["noddy.c"]),
|
|
Extension("noddy2", ["noddy2.c"]),
|
|
])
|
|
\end{verbatim}
|
|
|
|
\subsection{Providing finer control over data attributes}
|
|
|
|
In this section, we'll provide finer control over how the
|
|
\member{first} and \member{last} attributes are set in the
|
|
\class{Noddy} example. In the previous version of our module, the
|
|
instance variables \member{first} and \member{last} could be set to
|
|
non-string values or even deleted. We want to make sure that these
|
|
attributes always contain strings.
|
|
|
|
\verbatiminput{noddy3.c}
|
|
|
|
To provide greater control, over the \member{first} and \member{last}
|
|
attributes, we'll use custom getter and setter functions. Here are
|
|
the functions for getting and setting the \member{first} attribute:
|
|
|
|
\begin{verbatim}
|
|
Noddy_getfirst(Noddy *self, void *closure)
|
|
{
|
|
Py_INCREF(self->first);
|
|
return self->first;
|
|
}
|
|
|
|
static int
|
|
Noddy_setfirst(Noddy *self, PyObject *value, void *closure)
|
|
{
|
|
if (value == NULL) {
|
|
PyErr_SetString(PyExc_TypeError, "Cannot delete the first attribute");
|
|
return -1;
|
|
}
|
|
|
|
if (! PyString_Check(value)) {
|
|
PyErr_SetString(PyExc_TypeError,
|
|
"The first attribute value must be a string");
|
|
return -1;
|
|
}
|
|
|
|
Py_DECREF(self->first);
|
|
Py_INCREF(value);
|
|
self->first = value;
|
|
|
|
return 0;
|
|
}
|
|
\end{verbatim}
|
|
|
|
The getter function is passed a \class{Noddy} object and a
|
|
``closure'', which is void pointer. In this case, the closure is
|
|
ignored. (The closure supports an advanced usage in which definition
|
|
data is passed to the getter and setter. This could, for example, be
|
|
used to allow a single set of getter and setter functions that decide
|
|
the attribute to get or set based on data in the closure.)
|
|
|
|
The setter function is passed the \class{Noddy} object, the new value,
|
|
and the closure. The new value may be \NULL, in which case the
|
|
attribute is being deleted. In our setter, we raise an error if the
|
|
attribute is deleted or if the attribute value is not a string.
|
|
|
|
We create an array of \ctype{PyGetSetDef} structures:
|
|
|
|
\begin{verbatim}
|
|
static PyGetSetDef Noddy_getseters[] = {
|
|
{"first",
|
|
(getter)Noddy_getfirst, (setter)Noddy_setfirst,
|
|
"first name",
|
|
NULL},
|
|
{"last",
|
|
(getter)Noddy_getlast, (setter)Noddy_setlast,
|
|
"last name",
|
|
NULL},
|
|
{NULL} /* Sentinel */
|
|
};
|
|
\end{verbatim}
|
|
|
|
and register it in the \member{tp_getset} slot:
|
|
|
|
\begin{verbatim}
|
|
Noddy_getseters, /* tp_getset */
|
|
\end{verbatim}
|
|
|
|
to register out attribute getters and setters.
|
|
|
|
The last item in a \ctype{PyGetSetDef} structure is the closure
|
|
mentioned above. In this case, we aren't using the closure, so we just
|
|
pass \NULL.
|
|
|
|
We also remove the member definitions for these attributes:
|
|
|
|
\begin{verbatim}
|
|
static PyMemberDef Noddy_members[] = {
|
|
{"number", T_INT, offsetof(Noddy, number), 0,
|
|
"noddy number"},
|
|
{NULL} /* Sentinel */
|
|
};
|
|
\end{verbatim}
|
|
|
|
We also need to update the \member{tp_init} handler to only allow
|
|
strings\footnote{We now know that the first and last members are strings,
|
|
so perhaps we could be less careful about decrementing their
|
|
reference counts, however, we accept instances of string subclasses.
|
|
Even though deallocating normal strings won't call back into our
|
|
objects, we can't guarantee that deallocating an instance of a string
|
|
subclass won't. call back into out objects.} to be passed:
|
|
|
|
\begin{verbatim}
|
|
static int
|
|
Noddy_init(Noddy *self, PyObject *args, PyObject *kwds)
|
|
{
|
|
PyObject *first=NULL, *last=NULL, *tmp;
|
|
|
|
static char *kwlist[] = {"first", "last", "number", NULL};
|
|
|
|
if (! PyArg_ParseTupleAndKeywords(args, kwds, "|SSi", kwlist,
|
|
&first, &last,
|
|
&self->number))
|
|
return -1;
|
|
|
|
if (first) {
|
|
tmp = self->first;
|
|
Py_INCREF(first);
|
|
self->first = first;
|
|
Py_DECREF(tmp);
|
|
}
|
|
|
|
if (last) {
|
|
tmp = self->last;
|
|
Py_INCREF(last);
|
|
self->last = last;
|
|
Py_DECREF(tmp);
|
|
}
|
|
|
|
return 0;
|
|
}
|
|
\end{verbatim}
|
|
|
|
With these changes, we can assure that the \member{first} and
|
|
\member{last} members are never \NULL{} so we can remove checks for \NULL{}
|
|
values in almost all cases. This means that most of the
|
|
\cfunction{Py_XDECREF()} calls can be converted to \cfunction{Py_DECREF()}
|
|
calls. The only place we can't change these calls is in the
|
|
deallocator, where there is the possibility that the initialization of
|
|
these members failed in the constructor.
|
|
|
|
We also rename the module initialization function and module name in
|
|
the initialization function, as we did before, and we add an extra
|
|
definition to the \file{setup.py} file.
|
|
|
|
\subsection{Supporting cyclic garbage collection}
|
|
|
|
Python has a cyclic-garbage collector that can identify unneeded
|
|
objects even when their reference counts are not zero. This can happen
|
|
when objects are involved in cycles. For example, consider:
|
|
|
|
\begin{verbatim}
|
|
>>> l = []
|
|
>>> l.append(l)
|
|
>>> del l
|
|
\end{verbatim}
|
|
|
|
In this example, we create a list that contains itself. When we delete
|
|
it, it still has a reference from itself. Its reference count doesn't
|
|
drop to zero. Fortunately, Python's cyclic-garbage collector will
|
|
eventually figure out that the list is garbage and free it.
|
|
|
|
In the second version of the \class{Noddy} example, we allowed any
|
|
kind of object to be stored in the \member{first} or \member{last}
|
|
attributes.\footnote{Even in the third version, we aren't guaranteed to
|
|
avoid cycles. Instances of string subclasses are allowed and string
|
|
subclasses could allow cycles even if normal strings don't.} This
|
|
means that \class{Noddy} objects can participate in cycles:
|
|
|
|
\begin{verbatim}
|
|
>>> import noddy2
|
|
>>> n = noddy2.Noddy()
|
|
>>> l = [n]
|
|
>>> n.first = l
|
|
\end{verbatim}
|
|
|
|
This is pretty silly, but it gives us an excuse to add support for the
|
|
cyclic-garbage collector to the \class{Noddy} example. To support
|
|
cyclic garbage collection, types need to fill two slots and set a
|
|
class flag that enables these slots:
|
|
|
|
\verbatiminput{noddy4.c}
|
|
|
|
The traversal method provides access to subobjects that
|
|
could participate in cycles:
|
|
|
|
\begin{verbatim}
|
|
static int
|
|
Noddy_traverse(Noddy *self, visitproc visit, void *arg)
|
|
{
|
|
int vret;
|
|
|
|
if (self->first) {
|
|
vret = visit(self->first, arg);
|
|
if (vret != 0)
|
|
return vret;
|
|
}
|
|
if (self->last) {
|
|
vret = visit(self->last, arg);
|
|
if (vret != 0)
|
|
return vret;
|
|
}
|
|
|
|
return 0;
|
|
}
|
|
\end{verbatim}
|
|
|
|
For each subobject that can participate in cycles, we need to call the
|
|
\cfunction{visit()} function, which is passed to the traversal method.
|
|
The \cfunction{visit()} function takes as arguments the subobject and
|
|
the extra argument \var{arg} passed to the traversal method. It
|
|
returns an integer value that must be returned if it is non-zero.
|
|
|
|
|
|
Python 2.4 and higher provide a \cfunction{Py_VISIT()} macro that automates
|
|
calling visit functions. With \cfunction{Py_VISIT()},
|
|
\cfunction{Noddy_traverse()} can be simplified:
|
|
|
|
|
|
\begin{verbatim}
|
|
static int
|
|
Noddy_traverse(Noddy *self, visitproc visit, void *arg)
|
|
{
|
|
Py_VISIT(self->first);
|
|
Py_VISIT(self->last);
|
|
return 0;
|
|
}
|
|
\end{verbatim}
|
|
|
|
\note{Note that the \member{tp_traverse} implementation must name its
|
|
arguments exactly \var{visit} and \var{arg} in order to use
|
|
\cfunction{Py_VISIT()}. This is to encourage uniformity
|
|
across these boring implementations.}
|
|
|
|
We also need to provide a method for clearing any subobjects that can
|
|
participate in cycles. We implement the method and reimplement the
|
|
deallocator to use it:
|
|
|
|
\begin{verbatim}
|
|
static int
|
|
Noddy_clear(Noddy *self)
|
|
{
|
|
PyObject *tmp;
|
|
|
|
tmp = self->first;
|
|
self->first = NULL;
|
|
Py_XDECREF(tmp);
|
|
|
|
tmp = self->last;
|
|
self->last = NULL;
|
|
Py_XDECREF(tmp);
|
|
|
|
return 0;
|
|
}
|
|
|
|
static void
|
|
Noddy_dealloc(Noddy* self)
|
|
{
|
|
Noddy_clear(self);
|
|
self->ob_type->tp_free((PyObject*)self);
|
|
}
|
|
\end{verbatim}
|
|
|
|
Notice the use of a temporary variable in \cfunction{Noddy_clear()}.
|
|
We use the temporary variable so that we can set each member to \NULL{}
|
|
before decrementing its reference count. We do this because, as was
|
|
discussed earlier, if the reference count drops to zero, we might
|
|
cause code to run that calls back into the object. In addition,
|
|
because we now support garbage collection, we also have to worry about
|
|
code being run that triggers garbage collection. If garbage
|
|
collection is run, our \member{tp_traverse} handler could get called.
|
|
We can't take a chance of having \cfunction{Noddy_traverse()} called
|
|
when a member's reference count has dropped to zero and its value
|
|
hasn't been set to \NULL.
|
|
|
|
Python 2.4 and higher provide a \cfunction{Py_CLEAR()} that automates
|
|
the careful decrementing of reference counts. With
|
|
\cfunction{Py_CLEAR()}, the \cfunction{Noddy_clear()} function can be
|
|
simplified:
|
|
|
|
\begin{verbatim}
|
|
static int
|
|
Noddy_clear(Noddy *self)
|
|
{
|
|
Py_CLEAR(self->first);
|
|
Py_CLEAR(self->last);
|
|
return 0;
|
|
}
|
|
\end{verbatim}
|
|
|
|
Finally, we add the \constant{Py_TPFLAGS_HAVE_GC} flag to the class
|
|
flags:
|
|
|
|
\begin{verbatim}
|
|
Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_HAVE_GC, /*tp_flags*/
|
|
\end{verbatim}
|
|
|
|
That's pretty much it. If we had written custom \member{tp_alloc} or
|
|
\member{tp_free} slots, we'd need to modify them for cyclic-garbage
|
|
collection. Most extensions will use the versions automatically
|
|
provided.
|
|
|
|
\subsection{Subclassing other types}
|
|
|
|
It is possible to create new extension types that are derived from existing
|
|
types. It is easiest to inherit from the built in types, since an extension
|
|
can easily use the \class{PyTypeObject} it needs. It can be difficult to
|
|
share these \class{PyTypeObject} structures between extension modules.
|
|
|
|
In this example we will create a \class{Shoddy} type that inherits from
|
|
the builtin \class{list} type. The new type will be completely compatible
|
|
with regular lists, but will have an additional \method{increment()} method
|
|
that increases an internal counter.
|
|
|
|
\begin{verbatim}
|
|
>>> import shoddy
|
|
>>> s = shoddy.Shoddy(range(3))
|
|
>>> s.extend(s)
|
|
>>> print len(s)
|
|
6
|
|
>>> print s.increment()
|
|
1
|
|
>>> print s.increment()
|
|
2
|
|
\end{verbatim}
|
|
|
|
\verbatiminput{shoddy.c}
|
|
|
|
As you can see, the source code closely resembles the \class{Noddy} examples in previous
|
|
sections. We will break down the main differences between them.
|
|
|
|
\begin{verbatim}
|
|
typedef struct {
|
|
PyListObject list;
|
|
int state;
|
|
} Shoddy;
|
|
\end{verbatim}
|
|
|
|
The primary difference for derived type objects is that the base type's
|
|
object structure must be the first value. The base type will already
|
|
include the \cfunction{PyObject_HEAD} at the beginning of its structure.
|
|
|
|
When a Python object is a \class{Shoddy} instance, its \var{PyObject*} pointer
|
|
can be safely cast to both \var{PyListObject*} and \var{Shoddy*}.
|
|
|
|
\begin{verbatim}
|
|
static int
|
|
Shoddy_init(Shoddy *self, PyObject *args, PyObject *kwds)
|
|
{
|
|
if (PyList_Type.tp_init((PyObject *)self, args, kwds) < 0)
|
|
return -1;
|
|
self->state = 0;
|
|
return 0;
|
|
}
|
|
\end{verbatim}
|
|
|
|
In the \member{__init__} method for our type, we can see how to call through
|
|
to the \member{__init__} method of the base type.
|
|
|
|
This pattern is important when writing a type with custom \member{new} and
|
|
\member{dealloc} methods. The \member{new} method should not actually create the
|
|
memory for the object with \member{tp_alloc}, that will be handled by
|
|
the base class when calling its \member{tp_new}.
|
|
|
|
When filling out the \cfunction{PyTypeObject} for the \class{Shoddy} type,
|
|
you see a slot for \cfunction{tp_base}. Due to cross platform compiler
|
|
issues, you can't fill that field directly with the \cfunction{PyList_Type};
|
|
it can be done later in the module's \cfunction{init} function.
|
|
|
|
\begin{verbatim}
|
|
PyMODINIT_FUNC
|
|
initshoddy(void)
|
|
{
|
|
PyObject *m;
|
|
|
|
ShoddyType.tp_base = &PyList_Type;
|
|
if (PyType_Ready(&ShoddyType) < 0)
|
|
return;
|
|
|
|
m = Py_InitModule3("shoddy", NULL, "Shoddy module");
|
|
if (m == NULL)
|
|
return;
|
|
|
|
Py_INCREF(&ShoddyType);
|
|
PyModule_AddObject(m, "Shoddy", (PyObject *) &ShoddyType);
|
|
}
|
|
\end{verbatim}
|
|
|
|
Before calling \cfunction{PyType_Ready}, the type structure must have the
|
|
\member{tp_base} slot filled in. When we are deriving a new type, it is
|
|
not necessary to fill out the \member{tp_alloc} slot with
|
|
\cfunction{PyType_GenericNew} -- the allocate function from the base type
|
|
will be inherited.
|
|
|
|
After that, calling \cfunction{PyType_Ready} and adding the type object
|
|
to the module is the same as with the basic \class{Noddy} examples.
|
|
|
|
|
|
\section{Type Methods
|
|
\label{dnt-type-methods}}
|
|
|
|
This section aims to give a quick fly-by on the various type methods
|
|
you can implement and what they do.
|
|
|
|
Here is the definition of \ctype{PyTypeObject}, with some fields only
|
|
used in debug builds omitted:
|
|
|
|
\verbatiminput{typestruct.h}
|
|
|
|
Now that's a \emph{lot} of methods. Don't worry too much though - if
|
|
you have a type you want to define, the chances are very good that you
|
|
will only implement a handful of these.
|
|
|
|
As you probably expect by now, we're going to go over this and give
|
|
more information about the various handlers. We won't go in the order
|
|
they are defined in the structure, because there is a lot of
|
|
historical baggage that impacts the ordering of the fields; be sure
|
|
your type initialization keeps the fields in the right order! It's
|
|
often easiest to find an example that includes all the fields you need
|
|
(even if they're initialized to \code{0}) and then change the values
|
|
to suit your new type.
|
|
|
|
\begin{verbatim}
|
|
char *tp_name; /* For printing */
|
|
\end{verbatim}
|
|
|
|
The name of the type - as mentioned in the last section, this will
|
|
appear in various places, almost entirely for diagnostic purposes.
|
|
Try to choose something that will be helpful in such a situation!
|
|
|
|
\begin{verbatim}
|
|
int tp_basicsize, tp_itemsize; /* For allocation */
|
|
\end{verbatim}
|
|
|
|
These fields tell the runtime how much memory to allocate when new
|
|
objects of this type are created. Python has some built-in support
|
|
for variable length structures (think: strings, lists) which is where
|
|
the \member{tp_itemsize} field comes in. This will be dealt with
|
|
later.
|
|
|
|
\begin{verbatim}
|
|
char *tp_doc;
|
|
\end{verbatim}
|
|
|
|
Here you can put a string (or its address) that you want returned when
|
|
the Python script references \code{obj.__doc__} to retrieve the
|
|
doc string.
|
|
|
|
Now we come to the basic type methods---the ones most extension types
|
|
will implement.
|
|
|
|
|
|
\subsection{Finalization and De-allocation}
|
|
|
|
\index{object!deallocation}
|
|
\index{deallocation, object}
|
|
\index{object!finalization}
|
|
\index{finalization, of objects}
|
|
|
|
\begin{verbatim}
|
|
destructor tp_dealloc;
|
|
\end{verbatim}
|
|
|
|
This function is called when the reference count of the instance of
|
|
your type is reduced to zero and the Python interpreter wants to
|
|
reclaim it. If your type has memory to free or other clean-up to
|
|
perform, put it here. The object itself needs to be freed here as
|
|
well. Here is an example of this function:
|
|
|
|
\begin{verbatim}
|
|
static void
|
|
newdatatype_dealloc(newdatatypeobject * obj)
|
|
{
|
|
free(obj->obj_UnderlyingDatatypePtr);
|
|
obj->ob_type->tp_free(obj);
|
|
}
|
|
\end{verbatim}
|
|
|
|
One important requirement of the deallocator function is that it
|
|
leaves any pending exceptions alone. This is important since
|
|
deallocators are frequently called as the interpreter unwinds the
|
|
Python stack; when the stack is unwound due to an exception (rather
|
|
than normal returns), nothing is done to protect the deallocators from
|
|
seeing that an exception has already been set. Any actions which a
|
|
deallocator performs which may cause additional Python code to be
|
|
executed may detect that an exception has been set. This can lead to
|
|
misleading errors from the interpreter. The proper way to protect
|
|
against this is to save a pending exception before performing the
|
|
unsafe action, and restoring it when done. This can be done using the
|
|
\cfunction{PyErr_Fetch()}\ttindex{PyErr_Fetch()} and
|
|
\cfunction{PyErr_Restore()}\ttindex{PyErr_Restore()} functions:
|
|
|
|
\begin{verbatim}
|
|
static void
|
|
my_dealloc(PyObject *obj)
|
|
{
|
|
MyObject *self = (MyObject *) obj;
|
|
PyObject *cbresult;
|
|
|
|
if (self->my_callback != NULL) {
|
|
PyObject *err_type, *err_value, *err_traceback;
|
|
int have_error = PyErr_Occurred() ? 1 : 0;
|
|
|
|
if (have_error)
|
|
PyErr_Fetch(&err_type, &err_value, &err_traceback);
|
|
|
|
cbresult = PyObject_CallObject(self->my_callback, NULL);
|
|
if (cbresult == NULL)
|
|
PyErr_WriteUnraisable(self->my_callback);
|
|
else
|
|
Py_DECREF(cbresult);
|
|
|
|
if (have_error)
|
|
PyErr_Restore(err_type, err_value, err_traceback);
|
|
|
|
Py_DECREF(self->my_callback);
|
|
}
|
|
obj->ob_type->tp_free((PyObject*)self);
|
|
}
|
|
\end{verbatim}
|
|
|
|
|
|
\subsection{Object Presentation}
|
|
|
|
In Python, there are three ways to generate a textual representation
|
|
of an object: the \function{repr()}\bifuncindex{repr} function (or
|
|
equivalent back-tick syntax), the \function{str()}\bifuncindex{str}
|
|
function, and the \keyword{print} statement. For most objects, the
|
|
\keyword{print} statement is equivalent to the \function{str()}
|
|
function, but it is possible to special-case printing to a
|
|
\ctype{FILE*} if necessary; this should only be done if efficiency is
|
|
identified as a problem and profiling suggests that creating a
|
|
temporary string object to be written to a file is too expensive.
|
|
|
|
These handlers are all optional, and most types at most need to
|
|
implement the \member{tp_str} and \member{tp_repr} handlers.
|
|
|
|
\begin{verbatim}
|
|
reprfunc tp_repr;
|
|
reprfunc tp_str;
|
|
printfunc tp_print;
|
|
\end{verbatim}
|
|
|
|
The \member{tp_repr} handler should return a string object containing
|
|
a representation of the instance for which it is called. Here is a
|
|
simple example:
|
|
|
|
\begin{verbatim}
|
|
static PyObject *
|
|
newdatatype_repr(newdatatypeobject * obj)
|
|
{
|
|
return PyString_FromFormat("Repr-ified_newdatatype{{size:\%d}}",
|
|
obj->obj_UnderlyingDatatypePtr->size);
|
|
}
|
|
\end{verbatim}
|
|
|
|
If no \member{tp_repr} handler is specified, the interpreter will
|
|
supply a representation that uses the type's \member{tp_name} and a
|
|
uniquely-identifying value for the object.
|
|
|
|
The \member{tp_str} handler is to \function{str()} what the
|
|
\member{tp_repr} handler described above is to \function{repr()}; that
|
|
is, it is called when Python code calls \function{str()} on an
|
|
instance of your object. Its implementation is very similar to the
|
|
\member{tp_repr} function, but the resulting string is intended for
|
|
human consumption. If \member{tp_str} is not specified, the
|
|
\member{tp_repr} handler is used instead.
|
|
|
|
Here is a simple example:
|
|
|
|
\begin{verbatim}
|
|
static PyObject *
|
|
newdatatype_str(newdatatypeobject * obj)
|
|
{
|
|
return PyString_FromFormat("Stringified_newdatatype{{size:\%d}}",
|
|
obj->obj_UnderlyingDatatypePtr->size);
|
|
}
|
|
\end{verbatim}
|
|
|
|
The print function will be called whenever Python needs to "print" an
|
|
instance of the type. For example, if 'node' is an instance of type
|
|
TreeNode, then the print function is called when Python code calls:
|
|
|
|
\begin{verbatim}
|
|
print node
|
|
\end{verbatim}
|
|
|
|
There is a flags argument and one flag, \constant{Py_PRINT_RAW}, and
|
|
it suggests that you print without string quotes and possibly without
|
|
interpreting escape sequences.
|
|
|
|
The print function receives a file object as an argument. You will
|
|
likely want to write to that file object.
|
|
|
|
Here is a sample print function:
|
|
|
|
\begin{verbatim}
|
|
static int
|
|
newdatatype_print(newdatatypeobject *obj, FILE *fp, int flags)
|
|
{
|
|
if (flags & Py_PRINT_RAW) {
|
|
fprintf(fp, "<{newdatatype object--size: %d}>",
|
|
obj->obj_UnderlyingDatatypePtr->size);
|
|
}
|
|
else {
|
|
fprintf(fp, "\"<{newdatatype object--size: %d}>\"",
|
|
obj->obj_UnderlyingDatatypePtr->size);
|
|
}
|
|
return 0;
|
|
}
|
|
\end{verbatim}
|
|
|
|
|
|
\subsection{Attribute Management}
|
|
|
|
For every object which can support attributes, the corresponding type
|
|
must provide the functions that control how the attributes are
|
|
resolved. There needs to be a function which can retrieve attributes
|
|
(if any are defined), and another to set attributes (if setting
|
|
attributes is allowed). Removing an attribute is a special case, for
|
|
which the new value passed to the handler is \NULL.
|
|
|
|
Python supports two pairs of attribute handlers; a type that supports
|
|
attributes only needs to implement the functions for one pair. The
|
|
difference is that one pair takes the name of the attribute as a
|
|
\ctype{char*}, while the other accepts a \ctype{PyObject*}. Each type
|
|
can use whichever pair makes more sense for the implementation's
|
|
convenience.
|
|
|
|
\begin{verbatim}
|
|
getattrfunc tp_getattr; /* char * version */
|
|
setattrfunc tp_setattr;
|
|
/* ... */
|
|
getattrofunc tp_getattrofunc; /* PyObject * version */
|
|
setattrofunc tp_setattrofunc;
|
|
\end{verbatim}
|
|
|
|
If accessing attributes of an object is always a simple operation
|
|
(this will be explained shortly), there are generic implementations
|
|
which can be used to provide the \ctype{PyObject*} version of the
|
|
attribute management functions. The actual need for type-specific
|
|
attribute handlers almost completely disappeared starting with Python
|
|
2.2, though there are many examples which have not been updated to use
|
|
some of the new generic mechanism that is available.
|
|
|
|
|
|
\subsubsection{Generic Attribute Management}
|
|
|
|
\versionadded{2.2}
|
|
|
|
Most extension types only use \emph{simple} attributes. So, what
|
|
makes the attributes simple? There are only a couple of conditions
|
|
that must be met:
|
|
|
|
\begin{enumerate}
|
|
\item The name of the attributes must be known when
|
|
\cfunction{PyType_Ready()} is called.
|
|
|
|
\item No special processing is needed to record that an attribute
|
|
was looked up or set, nor do actions need to be taken based
|
|
on the value.
|
|
\end{enumerate}
|
|
|
|
Note that this list does not place any restrictions on the values of
|
|
the attributes, when the values are computed, or how relevant data is
|
|
stored.
|
|
|
|
When \cfunction{PyType_Ready()} is called, it uses three tables
|
|
referenced by the type object to create \emph{descriptors} which are
|
|
placed in the dictionary of the type object. Each descriptor controls
|
|
access to one attribute of the instance object. Each of the tables is
|
|
optional; if all three are \NULL, instances of the type will only have
|
|
attributes that are inherited from their base type, and should leave
|
|
the \member{tp_getattro} and \member{tp_setattro} fields \NULL{} as
|
|
well, allowing the base type to handle attributes.
|
|
|
|
The tables are declared as three fields of the type object:
|
|
|
|
\begin{verbatim}
|
|
struct PyMethodDef *tp_methods;
|
|
struct PyMemberDef *tp_members;
|
|
struct PyGetSetDef *tp_getset;
|
|
\end{verbatim}
|
|
|
|
If \member{tp_methods} is not \NULL, it must refer to an array of
|
|
\ctype{PyMethodDef} structures. Each entry in the table is an
|
|
instance of this structure:
|
|
|
|
\begin{verbatim}
|
|
typedef struct PyMethodDef {
|
|
char *ml_name; /* method name */
|
|
PyCFunction ml_meth; /* implementation function */
|
|
int ml_flags; /* flags */
|
|
char *ml_doc; /* docstring */
|
|
} PyMethodDef;
|
|
\end{verbatim}
|
|
|
|
One entry should be defined for each method provided by the type; no
|
|
entries are needed for methods inherited from a base type. One
|
|
additional entry is needed at the end; it is a sentinel that marks the
|
|
end of the array. The \member{ml_name} field of the sentinel must be
|
|
\NULL.
|
|
|
|
XXX Need to refer to some unified discussion of the structure fields,
|
|
shared with the next section.
|
|
|
|
The second table is used to define attributes which map directly to
|
|
data stored in the instance. A variety of primitive C types are
|
|
supported, and access may be read-only or read-write. The structures
|
|
in the table are defined as:
|
|
|
|
\begin{verbatim}
|
|
typedef struct PyMemberDef {
|
|
char *name;
|
|
int type;
|
|
int offset;
|
|
int flags;
|
|
char *doc;
|
|
} PyMemberDef;
|
|
\end{verbatim}
|
|
|
|
For each entry in the table, a descriptor will be constructed and
|
|
added to the type which will be able to extract a value from the
|
|
instance structure. The \member{type} field should contain one of the
|
|
type codes defined in the \file{structmember.h} header; the value will
|
|
be used to determine how to convert Python values to and from C
|
|
values. The \member{flags} field is used to store flags which control
|
|
how the attribute can be accessed.
|
|
|
|
XXX Need to move some of this to a shared section!
|
|
|
|
The following flag constants are defined in \file{structmember.h};
|
|
they may be combined using bitwise-OR.
|
|
|
|
\begin{tableii}{l|l}{constant}{Constant}{Meaning}
|
|
\lineii{READONLY \ttindex{READONLY}}
|
|
{Never writable.}
|
|
\lineii{RO \ttindex{RO}}
|
|
{Shorthand for \constant{READONLY}.}
|
|
\lineii{READ_RESTRICTED \ttindex{READ_RESTRICTED}}
|
|
{Not readable in restricted mode.}
|
|
\lineii{WRITE_RESTRICTED \ttindex{WRITE_RESTRICTED}}
|
|
{Not writable in restricted mode.}
|
|
\lineii{RESTRICTED \ttindex{RESTRICTED}}
|
|
{Not readable or writable in restricted mode.}
|
|
\end{tableii}
|
|
|
|
An interesting advantage of using the \member{tp_members} table to
|
|
build descriptors that are used at runtime is that any attribute
|
|
defined this way can have an associated doc string simply by providing
|
|
the text in the table. An application can use the introspection API
|
|
to retrieve the descriptor from the class object, and get the
|
|
doc string using its \member{__doc__} attribute.
|
|
|
|
As with the \member{tp_methods} table, a sentinel entry with a
|
|
\member{name} value of \NULL{} is required.
|
|
|
|
|
|
% XXX Descriptors need to be explained in more detail somewhere, but
|
|
% not here.
|
|
%
|
|
% Descriptor objects have two handler functions which correspond to
|
|
% the \member{tp_getattro} and \member{tp_setattro} handlers. The
|
|
% \method{__get__()} handler is a function which is passed the
|
|
% descriptor, instance, and type objects, and returns the value of the
|
|
% attribute, or it returns \NULL{} and sets an exception. The
|
|
% \method{__set__()} handler is passed the descriptor, instance, type,
|
|
% and new value;
|
|
|
|
|
|
\subsubsection{Type-specific Attribute Management}
|
|
|
|
For simplicity, only the \ctype{char*} version will be demonstrated
|
|
here; the type of the name parameter is the only difference between
|
|
the \ctype{char*} and \ctype{PyObject*} flavors of the interface.
|
|
This example effectively does the same thing as the generic example
|
|
above, but does not use the generic support added in Python 2.2. The
|
|
value in showing this is two-fold: it demonstrates how basic attribute
|
|
management can be done in a way that is portable to older versions of
|
|
Python, and explains how the handler functions are called, so that if
|
|
you do need to extend their functionality, you'll understand what
|
|
needs to be done.
|
|
|
|
The \member{tp_getattr} handler is called when the object requires an
|
|
attribute look-up. It is called in the same situations where the
|
|
\method{__getattr__()} method of a class would be called.
|
|
|
|
A likely way to handle this is (1) to implement a set of functions
|
|
(such as \cfunction{newdatatype_getSize()} and
|
|
\cfunction{newdatatype_setSize()} in the example below), (2) provide a
|
|
method table listing these functions, and (3) provide a getattr
|
|
function that returns the result of a lookup in that table. The
|
|
method table uses the same structure as the \member{tp_methods} field
|
|
of the type object.
|
|
|
|
Here is an example:
|
|
|
|
\begin{verbatim}
|
|
static PyMethodDef newdatatype_methods[] = {
|
|
{"getSize", (PyCFunction)newdatatype_getSize, METH_VARARGS,
|
|
"Return the current size."},
|
|
{"setSize", (PyCFunction)newdatatype_setSize, METH_VARARGS,
|
|
"Set the size."},
|
|
{NULL, NULL, 0, NULL} /* sentinel */
|
|
};
|
|
|
|
static PyObject *
|
|
newdatatype_getattr(newdatatypeobject *obj, char *name)
|
|
{
|
|
return Py_FindMethod(newdatatype_methods, (PyObject *)obj, name);
|
|
}
|
|
\end{verbatim}
|
|
|
|
The \member{tp_setattr} handler is called when the
|
|
\method{__setattr__()} or \method{__delattr__()} method of a class
|
|
instance would be called. When an attribute should be deleted, the
|
|
third parameter will be \NULL. Here is an example that simply raises
|
|
an exception; if this were really all you wanted, the
|
|
\member{tp_setattr} handler should be set to \NULL.
|
|
|
|
\begin{verbatim}
|
|
static int
|
|
newdatatype_setattr(newdatatypeobject *obj, char *name, PyObject *v)
|
|
{
|
|
(void)PyErr_Format(PyExc_RuntimeError, "Read-only attribute: \%s", name);
|
|
return -1;
|
|
}
|
|
\end{verbatim}
|
|
|
|
|
|
\subsection{Object Comparison}
|
|
|
|
\begin{verbatim}
|
|
cmpfunc tp_compare;
|
|
\end{verbatim}
|
|
|
|
The \member{tp_compare} handler is called when comparisons are needed
|
|
and the object does not implement the specific rich comparison method
|
|
which matches the requested comparison. (It is always used if defined
|
|
and the \cfunction{PyObject_Compare()} or \cfunction{PyObject_Cmp()}
|
|
functions are used, or if \function{cmp()} is used from Python.)
|
|
It is analogous to the \method{__cmp__()} method. This function
|
|
should return \code{-1} if \var{obj1} is less than
|
|
\var{obj2}, \code{0} if they are equal, and \code{1} if
|
|
\var{obj1} is greater than
|
|
\var{obj2}.
|
|
(It was previously allowed to return arbitrary negative or positive
|
|
integers for less than and greater than, respectively; as of Python
|
|
2.2, this is no longer allowed. In the future, other return values
|
|
may be assigned a different meaning.)
|
|
|
|
A \member{tp_compare} handler may raise an exception. In this case it
|
|
should return a negative value. The caller has to test for the
|
|
exception using \cfunction{PyErr_Occurred()}.
|
|
|
|
|
|
Here is a sample implementation:
|
|
|
|
\begin{verbatim}
|
|
static int
|
|
newdatatype_compare(newdatatypeobject * obj1, newdatatypeobject * obj2)
|
|
{
|
|
long result;
|
|
|
|
if (obj1->obj_UnderlyingDatatypePtr->size <
|
|
obj2->obj_UnderlyingDatatypePtr->size) {
|
|
result = -1;
|
|
}
|
|
else if (obj1->obj_UnderlyingDatatypePtr->size >
|
|
obj2->obj_UnderlyingDatatypePtr->size) {
|
|
result = 1;
|
|
}
|
|
else {
|
|
result = 0;
|
|
}
|
|
return result;
|
|
}
|
|
\end{verbatim}
|
|
|
|
|
|
\subsection{Abstract Protocol Support}
|
|
|
|
Python supports a variety of \emph{abstract} `protocols;' the specific
|
|
interfaces provided to use these interfaces are documented in the
|
|
\citetitle[../api/api.html]{Python/C API Reference Manual} in the
|
|
chapter ``\ulink{Abstract Objects Layer}{../api/abstract.html}.''
|
|
|
|
A number of these abstract interfaces were defined early in the
|
|
development of the Python implementation. In particular, the number,
|
|
mapping, and sequence protocols have been part of Python since the
|
|
beginning. Other protocols have been added over time. For protocols
|
|
which depend on several handler routines from the type implementation,
|
|
the older protocols have been defined as optional blocks of handlers
|
|
referenced by the type object. For newer protocols there are
|
|
additional slots in the main type object, with a flag bit being set to
|
|
indicate that the slots are present and should be checked by the
|
|
interpreter. (The flag bit does not indicate that the slot values are
|
|
non-\NULL. The flag may be set to indicate the presence of a slot,
|
|
but a slot may still be unfilled.)
|
|
|
|
\begin{verbatim}
|
|
PyNumberMethods tp_as_number;
|
|
PySequenceMethods tp_as_sequence;
|
|
PyMappingMethods tp_as_mapping;
|
|
\end{verbatim}
|
|
|
|
If you wish your object to be able to act like a number, a sequence,
|
|
or a mapping object, then you place the address of a structure that
|
|
implements the C type \ctype{PyNumberMethods},
|
|
\ctype{PySequenceMethods}, or \ctype{PyMappingMethods}, respectively.
|
|
It is up to you to fill in this structure with appropriate values. You
|
|
can find examples of the use of each of these in the \file{Objects}
|
|
directory of the Python source distribution.
|
|
|
|
|
|
\begin{verbatim}
|
|
hashfunc tp_hash;
|
|
\end{verbatim}
|
|
|
|
This function, if you choose to provide it, should return a hash
|
|
number for an instance of your data type. Here is a moderately
|
|
pointless example:
|
|
|
|
\begin{verbatim}
|
|
static long
|
|
newdatatype_hash(newdatatypeobject *obj)
|
|
{
|
|
long result;
|
|
result = obj->obj_UnderlyingDatatypePtr->size;
|
|
result = result * 3;
|
|
return result;
|
|
}
|
|
\end{verbatim}
|
|
|
|
\begin{verbatim}
|
|
ternaryfunc tp_call;
|
|
\end{verbatim}
|
|
|
|
This function is called when an instance of your data type is "called",
|
|
for example, if \code{obj1} is an instance of your data type and the Python
|
|
script contains \code{obj1('hello')}, the \member{tp_call} handler is
|
|
invoked.
|
|
|
|
This function takes three arguments:
|
|
|
|
\begin{enumerate}
|
|
\item
|
|
\var{arg1} is the instance of the data type which is the subject of
|
|
the call. If the call is \code{obj1('hello')}, then \var{arg1} is
|
|
\code{obj1}.
|
|
|
|
\item
|
|
\var{arg2} is a tuple containing the arguments to the call. You
|
|
can use \cfunction{PyArg_ParseTuple()} to extract the arguments.
|
|
|
|
\item
|
|
\var{arg3} is a dictionary of keyword arguments that were passed.
|
|
If this is non-\NULL{} and you support keyword arguments, use
|
|
\cfunction{PyArg_ParseTupleAndKeywords()} to extract the
|
|
arguments. If you do not want to support keyword arguments and
|
|
this is non-\NULL, raise a \exception{TypeError} with a message
|
|
saying that keyword arguments are not supported.
|
|
\end{enumerate}
|
|
|
|
Here is a desultory example of the implementation of the call function.
|
|
|
|
\begin{verbatim}
|
|
/* Implement the call function.
|
|
* obj1 is the instance receiving the call.
|
|
* obj2 is a tuple containing the arguments to the call, in this
|
|
* case 3 strings.
|
|
*/
|
|
static PyObject *
|
|
newdatatype_call(newdatatypeobject *obj, PyObject *args, PyObject *other)
|
|
{
|
|
PyObject *result;
|
|
char *arg1;
|
|
char *arg2;
|
|
char *arg3;
|
|
|
|
if (!PyArg_ParseTuple(args, "sss:call", &arg1, &arg2, &arg3)) {
|
|
return NULL;
|
|
}
|
|
result = PyString_FromFormat(
|
|
"Returning -- value: [\%d] arg1: [\%s] arg2: [\%s] arg3: [\%s]\n",
|
|
obj->obj_UnderlyingDatatypePtr->size,
|
|
arg1, arg2, arg3);
|
|
printf("\%s", PyString_AS_STRING(result));
|
|
return result;
|
|
}
|
|
\end{verbatim}
|
|
|
|
XXX some fields need to be added here...
|
|
|
|
|
|
\begin{verbatim}
|
|
/* Added in release 2.2 */
|
|
/* Iterators */
|
|
getiterfunc tp_iter;
|
|
iternextfunc tp_iternext;
|
|
\end{verbatim}
|
|
|
|
These functions provide support for the iterator protocol. Any object
|
|
which wishes to support iteration over its contents (which may be
|
|
generated during iteration) must implement the \code{tp_iter}
|
|
handler. Objects which are returned by a \code{tp_iter} handler must
|
|
implement both the \code{tp_iter} and \code{tp_iternext} handlers.
|
|
Both handlers take exactly one parameter, the instance for which they
|
|
are being called, and return a new reference. In the case of an
|
|
error, they should set an exception and return \NULL.
|
|
|
|
For an object which represents an iterable collection, the
|
|
\code{tp_iter} handler must return an iterator object. The iterator
|
|
object is responsible for maintaining the state of the iteration. For
|
|
collections which can support multiple iterators which do not
|
|
interfere with each other (as lists and tuples do), a new iterator
|
|
should be created and returned. Objects which can only be iterated
|
|
over once (usually due to side effects of iteration) should implement
|
|
this handler by returning a new reference to themselves, and should
|
|
also implement the \code{tp_iternext} handler. File objects are an
|
|
example of such an iterator.
|
|
|
|
Iterator objects should implement both handlers. The \code{tp_iter}
|
|
handler should return a new reference to the iterator (this is the
|
|
same as the \code{tp_iter} handler for objects which can only be
|
|
iterated over destructively). The \code{tp_iternext} handler should
|
|
return a new reference to the next object in the iteration if there is
|
|
one. If the iteration has reached the end, it may return \NULL{}
|
|
without setting an exception or it may set \exception{StopIteration};
|
|
avoiding the exception can yield slightly better performance. If an
|
|
actual error occurs, it should set an exception and return \NULL.
|
|
|
|
|
|
\subsection{Weak Reference Support\label{weakref-support}}
|
|
|
|
One of the goals of Python's weak-reference implementation is to allow
|
|
any type to participate in the weak reference mechanism without
|
|
incurring the overhead on those objects which do not benefit by weak
|
|
referencing (such as numbers).
|
|
|
|
For an object to be weakly referencable, the extension must include a
|
|
\ctype{PyObject*} field in the instance structure for the use of the
|
|
weak reference mechanism; it must be initialized to \NULL{} by the
|
|
object's constructor. It must also set the \member{tp_weaklistoffset}
|
|
field of the corresponding type object to the offset of the field.
|
|
For example, the instance type is defined with the following
|
|
structure:
|
|
|
|
\begin{verbatim}
|
|
typedef struct {
|
|
PyObject_HEAD
|
|
PyClassObject *in_class; /* The class object */
|
|
PyObject *in_dict; /* A dictionary */
|
|
PyObject *in_weakreflist; /* List of weak references */
|
|
} PyInstanceObject;
|
|
\end{verbatim}
|
|
|
|
The statically-declared type object for instances is defined this way:
|
|
|
|
\begin{verbatim}
|
|
PyTypeObject PyInstance_Type = {
|
|
PyObject_HEAD_INIT(&PyType_Type)
|
|
0,
|
|
"module.instance",
|
|
|
|
/* Lots of stuff omitted for brevity... */
|
|
|
|
Py_TPFLAGS_DEFAULT, /* tp_flags */
|
|
0, /* tp_doc */
|
|
0, /* tp_traverse */
|
|
0, /* tp_clear */
|
|
0, /* tp_richcompare */
|
|
offsetof(PyInstanceObject, in_weakreflist), /* tp_weaklistoffset */
|
|
};
|
|
\end{verbatim}
|
|
|
|
The type constructor is responsible for initializing the weak reference
|
|
list to \NULL:
|
|
|
|
\begin{verbatim}
|
|
static PyObject *
|
|
instance_new() {
|
|
/* Other initialization stuff omitted for brevity */
|
|
|
|
self->in_weakreflist = NULL;
|
|
|
|
return (PyObject *) self;
|
|
}
|
|
\end{verbatim}
|
|
|
|
The only further addition is that the destructor needs to call the
|
|
weak reference manager to clear any weak references. This should be
|
|
done before any other parts of the destruction have occurred, but is
|
|
only required if the weak reference list is non-\NULL:
|
|
|
|
\begin{verbatim}
|
|
static void
|
|
instance_dealloc(PyInstanceObject *inst)
|
|
{
|
|
/* Allocate temporaries if needed, but do not begin
|
|
destruction just yet.
|
|
*/
|
|
|
|
if (inst->in_weakreflist != NULL)
|
|
PyObject_ClearWeakRefs((PyObject *) inst);
|
|
|
|
/* Proceed with object destruction normally. */
|
|
}
|
|
\end{verbatim}
|
|
|
|
|
|
\subsection{More Suggestions}
|
|
|
|
Remember that you can omit most of these functions, in which case you
|
|
provide \code{0} as a value. There are type definitions for each of
|
|
the functions you must provide. They are in \file{object.h} in the
|
|
Python include directory that comes with the source distribution of
|
|
Python.
|
|
|
|
In order to learn how to implement any specific method for your new
|
|
data type, do the following: Download and unpack the Python source
|
|
distribution. Go the \file{Objects} directory, then search the
|
|
C source files for \code{tp_} plus the function you want (for
|
|
example, \code{tp_print} or \code{tp_compare}). You will find
|
|
examples of the function you want to implement.
|
|
|
|
When you need to verify that an object is an instance of the type
|
|
you are implementing, use the \cfunction{PyObject_TypeCheck} function.
|
|
A sample of its use might be something like the following:
|
|
|
|
\begin{verbatim}
|
|
if (! PyObject_TypeCheck(some_object, &MyType)) {
|
|
PyErr_SetString(PyExc_TypeError, "arg #1 not a mything");
|
|
return NULL;
|
|
}
|
|
\end{verbatim}
|