108 lines
5.9 KiB
Plaintext
108 lines
5.9 KiB
Plaintext
Before 2.3.3, Python's cyclic gc didn't pay any attention to weakrefs.
|
|
Segfaults in Zope3 resulted.
|
|
|
|
weakrefs in Python are designed to, at worst, let *other* objects learn
|
|
that a given object has died, via a callback function. The weakly
|
|
referenced object itself is not passed to the callback, and the presumption
|
|
is that the weakly referenced object is unreachable trash at the time the
|
|
callback is invoked.
|
|
|
|
That's usually true, but not always. Suppose a weakly referenced object
|
|
becomes part of a clump of cyclic trash. When enough cycles are broken by
|
|
cyclic gc that the object is reclaimed, the callback is invoked. If it's
|
|
possible for the callback to get at objects in the cycle(s), then it may be
|
|
possible for those objects to access (via strong references in the cycle)
|
|
the weakly referenced object being torn down, or other objects in the cycle
|
|
that have already suffered a tp_clear() call. There's no guarantee that an
|
|
object is in a sane state after tp_clear(). Bad things (including
|
|
segfaults) can happen right then, during the callback's execution, or can
|
|
happen at any later time if the callback manages to resurrect an insane
|
|
object.
|
|
|
|
Note that if it's possible for the callback to get at objects in the trash
|
|
cycles, it must also be the case that the callback itself is part of the
|
|
trash cycles. Else the callback would have acted as an external root to
|
|
the current collection, and nothing reachable from it would be in cyclic
|
|
trash either.
|
|
|
|
More, if the callback itself is in cyclic trash, then the weakref to which
|
|
the callback is attached must also be trash, and for the same kind of
|
|
reason: if the weakref acted as an external root, then the callback could
|
|
not have been cyclic trash.
|
|
|
|
So a problem here requires that a weakref, that weakref's callback, and the
|
|
weakly referenced object, all be in cyclic trash at the same time. This
|
|
isn't easy to stumble into by accident while Python is running, and, indeed,
|
|
it took quite a while to dream up failing test cases. Zope3 saw segfaults
|
|
during shutdown, during the second call of gc in Py_Finalize, after most
|
|
modules had been torn down. That creates many trash cycles (esp. those
|
|
involving new-style classes), making the problem much more likely. Once you
|
|
know what's required to provoke the problem, though, it's easy to create
|
|
tests that segfault before shutdown.
|
|
|
|
In 2.3.3, before breaking cycles, we first clear all the weakrefs with
|
|
callbacks in cyclic trash. Since the weakrefs *are* trash, and there's no
|
|
defined-- or even predictable --order in which tp_clear() gets called on
|
|
cyclic trash, it's defensible to first clear weakrefs with callbacks. It's
|
|
a feature of Python's weakrefs too that when a weakref goes away, the
|
|
callback (if any) associated with it is thrown away too, unexecuted.
|
|
|
|
Just that much is almost enough to prevent problems, by throwing away
|
|
*almost* all the weakref callbacks that could get triggered by gc. The
|
|
problem remaining is that clearing a weakref with a callback decrefs the
|
|
callback object, and the callback object may *itself* be weakly referenced,
|
|
via another weakref with another callback. So the process of clearing
|
|
weakrefs can trigger callbacks attached to other weakrefs, and those
|
|
latter weakrefs may or may not be part of cyclic trash.
|
|
|
|
So, to prevent any Python code from running while gc is invoking tp_clear()
|
|
on all the objects in cyclic trash, it's not quite enough just to invoke
|
|
tp_clear() on weakrefs with callbacks first. Instead the weakref module
|
|
grew a new private function (_PyWeakref_ClearRef) that does only part of
|
|
tp_clear(): it removes the weakref from the weakly-referenced object's list
|
|
of weakrefs, but does not decref the callback object. So calling
|
|
_PyWeakref_ClearRef(wr) ensures that wr's callback object will never
|
|
trigger, and (unlike weakref's tp_clear()) also prevents any callback
|
|
associated *with* wr's callback object from triggering.
|
|
|
|
Then we can call tp_clear on all the cyclic objects and never trigger
|
|
Python code.
|
|
|
|
After we do that, the callback objects still need to be decref'ed. Callbacks
|
|
(if any) *on* the callback objects that were also part of cyclic trash won't
|
|
get invoked, because we cleared all trash weakrefs with callbacks at the
|
|
start. Callbacks on the callback objects that were not part of cyclic trash
|
|
acted as external roots to everything reachable from them, so nothing
|
|
reachable from them was part of cyclic trash, so gc didn't do any damage to
|
|
objects reachable from them, and it's safe to call them at the end of gc.
|
|
|
|
An alternative would have been to treat objects with callbacks like objects
|
|
with __del__ methods, refusing to collect them, appending them to gc.garbage
|
|
instead. That would have been much easier. Jim Fulton gave a strong
|
|
argument against that (on Python-Dev):
|
|
|
|
There's a big difference between __del__ and weakref callbacks.
|
|
The __del__ method is "internal" to a design. When you design a
|
|
class with a del method, you know you have to avoid including the
|
|
class in cycles.
|
|
|
|
Now, suppose you have a design that makes has no __del__ methods but
|
|
that does use cyclic data structures. You reason about the design,
|
|
run tests, and convince yourself you don't have a leak.
|
|
|
|
Now, suppose some external code creates a weakref to one of your
|
|
objects. All of a sudden, you start leaking. You can look at your
|
|
code all you want and you won't find a reason for the leak.
|
|
|
|
IOW, a class designer can out-think __del__ problems, but has no control
|
|
over who creates weakrefs to his classes or class instances. The class
|
|
user has little chance either of predicting when the weakrefs he creates
|
|
may end up in cycles.
|
|
|
|
Callbacks on weakref callbacks are executed in an arbitrary order, and
|
|
that's not good (a primary reason not to collect cycles with objects with
|
|
__del__ methods is to avoid running finalizers in an arbitrary order).
|
|
However, a weakref callback on a weakref callback has got to be rare.
|
|
It's possible to do such a thing, so gc has to be robust against it, but
|
|
I doubt anyone has done it outside the test case I wrote for it.
|