diff --git a/Doc/howto/descriptor.rst b/Doc/howto/descriptor.rst new file mode 100644 index 00000000000..3ec784023f6 --- /dev/null +++ b/Doc/howto/descriptor.rst @@ -0,0 +1,431 @@ +====================== +Descriptor HowTo Guide +====================== + +:Author: Raymond Hettinger +:Contact: + +.. Contents:: + +Abstract +-------- + +Defines descriptors, summarizes the protocol, and shows how descriptors are +called. Examines a custom descriptor and several built-in python descriptors +including functions, properties, static methods, and class methods. Shows how +each works by giving a pure Python equivalent and a sample application. + +Learning about descriptors not only provides access to a larger toolset, it +creates a deeper understanding of how Python works and an appreciation for the +elegance of its design. + + +Definition and Introduction +--------------------------- + +In general, a descriptor is an object attribute with "binding behavior", one +whose attribute access has been overridden by methods in the descriptor +protocol. Those methods are :meth:`__get__`, :meth:`__set__`, and +:meth:`__delete__`. If any of those methods are defined for an object, it is +said to be a descriptor. + +The default behavior for attribute access is to get, set, or delete the +attribute from an object's dictionary. For instance, ``a.x`` has a lookup chain +starting with ``a.__dict__['x']``, then ``type(a).__dict__['x']``, and +continuing through the base classes of ``type(a)`` excluding metaclasses. If the +looked-up value is an object defining one of the descriptor methods, then Python +may override the default behavior and invoke the descriptor method instead. +Where this occurs in the precedence chain depends on which descriptor methods +were defined. Note that descriptors are only invoked for new style objects or +classes (a class is new style if it inherits from :class:`object` or +:class:`type`). + +Descriptors are a powerful, general purpose protocol. They are the mechanism +behind properties, methods, static methods, class methods, and :func:`super()`. +They are used used throughout Python itself to implement the new style classes +introduced in version 2.2. Descriptors simplify the underlying C-code and offer +a flexible set of new tools for everyday Python programs. + + +Descriptor Protocol +------------------- + +``descr.__get__(self, obj, type=None) --> value`` + +``descr.__set__(self, obj, value) --> None`` + +``descr.__delete__(self, obj) --> None`` + +That is all there is to it. Define any of these methods and an object is +considered a descriptor and can override default behavior upon being looked up +as an attribute. + +If an object defines both :meth:`__get__` and :meth:`__set__`, it is considered +a data descriptor. Descriptors that only define :meth:`__get__` are called +non-data descriptors (they are typically used for methods but other uses are +possible). + +Data and non-data descriptors differ in how overrides are calculated with +respect to entries in an instance's dictionary. If an instance's dictionary +has an entry with the same name as a data descriptor, the data descriptor +takes precedence. If an instance's dictionary has an entry with the same +name as a non-data descriptor, the dictionary entry takes precedence. + +To make a read-only data descriptor, define both :meth:`__get__` and +:meth:`__set__` with the :meth:`__set__` raising an :exc:`AttributeError` when +called. Defining the :meth:`__set__` method with an exception raising +placeholder is enough to make it a data descriptor. + + +Invoking Descriptors +-------------------- + +A descriptor can be called directly by its method name. For example, +``d.__get__(obj)``. + +Alternatively, it is more common for a descriptor to be invoked automatically +upon attribute access. For example, ``obj.d`` looks up ``d`` in the dictionary +of ``obj``. If ``d`` defines the method :meth:`__get__`, then ``d.__get__(obj)`` +is invoked according to the precedence rules listed below. + +The details of invocation depend on whether ``obj`` is an object or a class. +Either way, descriptors only work for new style objects and classes. A class is +new style if it is a subclass of :class:`object`. + +For objects, the machinery is in :meth:`object.__getattribute__` which +transforms ``b.x`` into ``type(b).__dict__['x'].__get__(b, type(b))``. The +implementation works through a precedence chain that gives data descriptors +priority over instance variables, instance variables priority over non-data +descriptors, and assigns lowest priority to :meth:`__getattr__` if provided. The +full C implementation can be found in :cfunc:`PyObject_GenericGetAttr()` in +`Objects/object.c `_\. + +For classes, the machinery is in :meth:`type.__getattribute__` which transforms +``B.x`` into ``B.__dict__['x'].__get__(None, B)``. In pure Python, it looks +like:: + + def __getattribute__(self, key): + "Emulate type_getattro() in Objects/typeobject.c" + v = object.__getattribute__(self, key) + if hasattr(v, '__get__'): + return v.__get__(None, self) + return v + +The important points to remember are: + +* descriptors are invoked by the :meth:`__getattribute__` method +* overriding :meth:`__getattribute__` prevents automatic descriptor calls +* :meth:`__getattribute__` is only available with new style classes and objects +* :meth:`object.__getattribute__` and :meth:`type.__getattribute__` make + different calls to :meth:`__get__`. +* data descriptors always override instance dictionaries. +* non-data descriptors may be overridden by instance dictionaries. + +The object returned by ``super()`` also has a custom :meth:`__getattribute__` +method for invoking descriptors. The call ``super(B, obj).m()`` searches +``obj.__class__.__mro__`` for the base class ``A`` immediately following ``B`` +and then returns ``A.__dict__['m'].__get__(obj, A)``. If not a descriptor, +``m`` is returned unchanged. If not in the dictionary, ``m`` reverts to a +search using :meth:`object.__getattribute__`. + +Note, in Python 2.2, ``super(B, obj).m()`` would only invoke :meth:`__get__` if +``m`` was a data descriptor. In Python 2.3, non-data descriptors also get +invoked unless an old-style class is involved. The implementation details are +in :cfunc:`super_getattro()` in +`Objects/typeobject.c `_ +and a pure Python equivalent can be found in `Guido's Tutorial`_. + +.. _`Guido's Tutorial`: http://www.python.org/2.2.3/descrintro.html#cooperation + +The details above show that the mechanism for descriptors is embedded in the +:meth:`__getattribute__()` methods for :class:`object`, :class:`type`, and +:func:`super`. Classes inherit this machinery when they derive from +:class:`object` or if they have a meta-class providing similar functionality. +Likewise, classes can turn-off descriptor invocation by overriding +:meth:`__getattribute__()`. + + +Descriptor Example +------------------ + +The following code creates a class whose objects are data descriptors which +print a message for each get or set. Overriding :meth:`__getattribute__` is +alternate approach that could do this for every attribute. However, this +descriptor is useful for monitoring just a few chosen attributes:: + + class RevealAccess(object): + """A data descriptor that sets and returns values + normally and prints a message logging their access. + """ + + def __init__(self, initval=None, name='var'): + self.val = initval + self.name = name + + def __get__(self, obj, objtype): + print 'Retrieving', self.name + return self.val + + def __set__(self, obj, val): + print 'Updating' , self.name + self.val = val + + >>> class MyClass(object): + x = RevealAccess(10, 'var "x"') + y = 5 + + >>> m = MyClass() + >>> m.x + Retrieving var "x" + 10 + >>> m.x = 20 + Updating var "x" + >>> m.x + Retrieving var "x" + 20 + >>> m.y + 5 + +The protocol is simple and offers exciting possibilities. Several use cases are +so common that they have been packaged into individual function calls. +Properties, bound and unbound methods, static methods, and class methods are all +based on the descriptor protocol. + + +Properties +---------- + +Calling :func:`property` is a succinct way of building a data descriptor that +triggers function calls upon access to an attribute. Its signature is:: + + property(fget=None, fset=None, fdel=None, doc=None) -> property attribute + +The documentation shows a typical use to define a managed attribute ``x``:: + + class C(object): + def getx(self): return self.__x + def setx(self, value): self.__x = value + def delx(self): del self.__x + x = property(getx, setx, delx, "I'm the 'x' property.") + +To see how :func:`property` is implemented in terms of the descriptor protocol, +here is a pure Python equivalent:: + + class Property(object): + "Emulate PyProperty_Type() in Objects/descrobject.c" + + def __init__(self, fget=None, fset=None, fdel=None, doc=None): + self.fget = fget + self.fset = fset + self.fdel = fdel + self.__doc__ = doc + + def __get__(self, obj, objtype=None): + if obj is None: + return self + if self.fget is None: + raise AttributeError, "unreadable attribute" + return self.fget(obj) + + def __set__(self, obj, value): + if self.fset is None: + raise AttributeError, "can't set attribute" + self.fset(obj, value) + + def __delete__(self, obj): + if self.fdel is None: + raise AttributeError, "can't delete attribute" + self.fdel(obj) + +The :func:`property` builtin helps whenever a user interface has granted +attribute access and then subsequent changes require the intervention of a +method. + +For instance, a spreadsheet class may grant access to a cell value through +``Cell('b10').value``. Subsequent improvements to the program require the cell +to be recalculated on every access; however, the programmer does not want to +affect existing client code accessing the attribute directly. The solution is +to wrap access to the value attribute in a property data descriptor:: + + class Cell(object): + . . . + def getvalue(self, obj): + "Recalculate cell before returning value" + self.recalc() + return obj._value + value = property(getvalue) + + +Functions and Methods +--------------------- + +Python's object oriented features are built upon a function based environment. +Using non-data descriptors, the two are merged seamlessly. + +Class dictionaries store methods as functions. In a class definition, methods +are written using :keyword:`def` and :keyword:`lambda`, the usual tools for +creating functions. The only difference from regular functions is that the +first argument is reserved for the object instance. By Python convention, the +instance reference is called *self* but may be called *this* or any other +variable name. + +To support method calls, functions include the :meth:`__get__` method for +binding methods during attribute access. This means that all functions are +non-data descriptors which return bound or unbound methods depending whether +they are invoked from an object or a class. In pure python, it works like +this:: + + class Function(object): + . . . + def __get__(self, obj, objtype=None): + "Simulate func_descr_get() in Objects/funcobject.c" + return types.MethodType(self, obj, objtype) + +Running the interpreter shows how the function descriptor works in practice:: + + >>> class D(object): + def f(self, x): + return x + + >>> d = D() + >>> D.__dict__['f'] # Stored internally as a function + + >>> D.f # Get from a class becomes an unbound method + + >>> d.f # Get from an instance becomes a bound method + > + +The output suggests that bound and unbound methods are two different types. +While they could have been implemented that way, the actual C implemention of +:ctype:`PyMethod_Type` in +`Objects/classobject.c `_ +is a single object with two different representations depending on whether the +:attr:`im_self` field is set or is *NULL* (the C equivalent of *None*). + +Likewise, the effects of calling a method object depend on the :attr:`im_self` +field. If set (meaning bound), the original function (stored in the +:attr:`im_func` field) is called as expected with the first argument set to the +instance. If unbound, all of the arguments are passed unchanged to the original +function. The actual C implementation of :func:`instancemethod_call()` is only +slightly more complex in that it includes some type checking. + + +Static Methods and Class Methods +-------------------------------- + +Non-data descriptors provide a simple mechanism for variations on the usual +patterns of binding functions into methods. + +To recap, functions have a :meth:`__get__` method so that they can be converted +to a method when accessed as attributes. The non-data descriptor transforms a +``obj.f(*args)`` call into ``f(obj, *args)``. Calling ``klass.f(*args)`` +becomes ``f(*args)``. + +This chart summarizes the binding and its two most useful variants: + + +-----------------+--------------+----------------------+------------------+ + | Transformation | | Called from an | Called from a | + | | | Object | Class | + +=================+==============+======================+==================+ + | | function | f(obj, \*args) | f(\*args) | + + +--------------+----------------------+------------------+ + | Descriptor | staticmethod | f(\*args) | f(\*args) | + + +--------------+----------------------+------------------+ + | | classmethod | f(type(obj), \*args) | f(klass, \*args) | + +-----------------+--------------+----------------------+------------------+ + +Static methods return the underlying function without changes. Calling either +``c.f`` or ``C.f`` is the equivalent of a direct lookup into +``object.__getattribute__(c, "f")`` or ``object.__getattribute__(C, "f")``. As a +result, the function becomes identically accessible from either an object or a +class. + +Good candidates for static methods are methods that do not reference the +``self`` variable. + +For instance, a statistics package may include a container class for +experimental data. The class provides normal methods for computing the average, +mean, median, and other descriptive statistics that depend on the data. However, +there may be useful functions which are conceptually related but do not depend +on the data. For instance, ``erf(x)`` is handy conversion routine that comes up +in statistical work but does not directly depend on a particular dataset. +It can be called either from an object or the class: ``s.erf(1.5) --> .9332`` or +``Sample.erf(1.5) --> .9332``. + +Since staticmethods return the underlying function with no changes, the example +calls are unexciting:: + + >>> class E(object): + def f(x): + print x + f = staticmethod(f) + + >>> print E.f(3) + 3 + >>> print E().f(3) + 3 + +Using the non-data descriptor protocol, a pure Python version of +:func:`staticmethod` would look like this:: + + class StaticMethod(object): + "Emulate PyStaticMethod_Type() in Objects/funcobject.c" + + def __init__(self, f): + self.f = f + + def __get__(self, obj, objtype=None): + return self.f + +Unlike static methods, class methods prepend the class reference to the +argument list before calling the function. This format is the same +for whether the caller is an object or a class:: + + >>> class E(object): + def f(klass, x): + return klass.__name__, x + f = classmethod(f) + + >>> print E.f(3) + ('E', 3) + >>> print E().f(3) + ('E', 3) + + +This behavior is useful whenever the function only needs to have a class +reference and does not care about any underlying data. One use for classmethods +is to create alternate class constructors. In Python 2.3, the classmethod +:func:`dict.fromkeys` creates a new dictionary from a list of keys. The pure +Python equivalent is:: + + class Dict: + . . . + def fromkeys(klass, iterable, value=None): + "Emulate dict_fromkeys() in Objects/dictobject.c" + d = klass() + for key in iterable: + d[key] = value + return d + fromkeys = classmethod(fromkeys) + +Now a new dictionary of unique keys can be constructed like this:: + + >>> Dict.fromkeys('abracadabra') + {'a': None, 'r': None, 'b': None, 'c': None, 'd': None} + +Using the non-data descriptor protocol, a pure Python version of +:func:`classmethod` would look like this:: + + class ClassMethod(object): + "Emulate PyClassMethod_Type() in Objects/funcobject.c" + + def __init__(self, f): + self.f = f + + def __get__(self, obj, klass=None): + if klass is None: + klass = type(obj) + def newfunc(*args): + return self.f(klass, *args) + return newfunc + diff --git a/Doc/howto/index.rst b/Doc/howto/index.rst index 022beeede9f..417ae0047ef 100644 --- a/Doc/howto/index.rst +++ b/Doc/howto/index.rst @@ -16,6 +16,7 @@ Currently, the HOWTOs are: advocacy.rst cporting.rst curses.rst + descriptor.rst doanddont.rst functional.rst regex.rst