[ 784825 ] fix obscure crash in descriptor handling
Should be applied to release23-maint and in all likelyhood
release22-maint, too.
Certainly doesn't apply to release21-maint.
I'm finding some pretty baffling output, like reprs consisting entirely
of three left parens. At least this will let us know what type the object
is (it's not str -- there's no quote character in the repr).
New tool combinerefs.py, to combine the two output blocks produced via
PYTHONDUMPREFS.
new line.
New pvt API function _Py_PrintReferenceAddresses(): Prints only the
addresses and refcnts of the live objects. This is always safe to call,
because it has no dependence on Python's C API.
Py_Finalize(): If envar PYTHONDUMPREFS is set, call (the new)
_Py_PrintReferenceAddresses() right before dumping final pymalloc stats.
We can't print the reprs of the objects here because too much of the
interpreter has been shut down. You need to correlate the addresses
displayed here with the object reprs printed by the earlier
PYTHONDUMPREFS call to _Py_PrintReferences().
even farther down, to just before the call to
_PyObject_DebugMallocStats(). This required the following changes:
- pystate.c, PyThreadState_GetDict(): changed not to raise an
exception or issue a fatal error when no current thread state is
available, but simply return NULL without raising an exception
(ever).
- object.c, Py_ReprEnter(): when PyThreadState_GetDict() returns NULL,
don't raise an exception but return 0. This means that when
printing a container that's recursive, printing will go on and on
and on. But that shouldn't happen in the case we care about (see
first bullet).
- Updated Misc/NEWS and Doc/api/init.tex to reflect changes to
PyThreadState_GetDict() definition.
Arranged that all the objects exposed by __builtin__ appear in the list
of all objects. I basically peed away two days tracking down a mystery
leak in sys.gettotalrefcount() in a ZODB app (== tons of code), because
the object leaking the references didn't appear in the sys.getobjects(0)
list. The object happened to be False. Now False is in the list, along
with other popular & previously missing leak candidates (like None).
Alas, we still don't have a choke point covering *all* Python objects,
so the list of all objects may still be incomplete.
_Py_AddToAllObjects() that simply inserts an object at the front of
the doubly-linked list of all objects. Changed PyType_Ready() (the
closest thing we've got to a choke point for type objects) to call
that.
a doubly-linked list, exposed by sys.getobjects(). Unfortunately, it's not
really all live objects, and it seems my fate to bump into programs where
sys.gettotalrefcount() keeps going up but where the reference leaks aren't
accounted for by anything in the list of all objects.
This patch helps a little: if COUNT_ALLOCS is also defined, from now on
type objects will also appear in this list, provided at least one object
of a type has been allocated.
Don't access tp_descr_{get,set} of a descriptor without checking the
flag bits of the descriptor's type. While we know that the main type
(the type of the object whose attribute is being accessed) has all the
right flag bits (or else PyObject_Generic{Get,Set}Attr wouldn't be
called), we don't know that for its class attributes!
Will backport to 2.2.
was broken because new-in-2.3 code added a tp_as_mapping slot to tuples.
Repaired that.
Added basic docs to check_recursion().
The code that intended to exempt tuples and strings was also broken here,
and in 2.2: these should use PyXYZ_CheckExact(), not PyXYZ_Check() -- we
can't know whether subclass instances are immutable. This part (and this
part alone) is a bugfix candidate.
Py_Init crash". refchain cannot be cleared because objects can live across
Py_Finalize() and Py_Initialize() if they are kept alive by circular
references.
macros. The 'op' argument is then the result from PyObject_MALLOC,
and that can of course be NULL. In that case, PyObject_Init[Var]
would raise a SystemError with "NULL object passed to
PyObject_Init[Var]". But there's nothing the caller of the macro can
do about this. So PyObject_Init[Var] should call just PyErr_NoMemory.
Will backport.
helper macros to something saner, and used them appropriately in other
files too, to reduce #ifdef blocks.
classobject.c, instance_dealloc(): One of my worst Python Memories is
trying to fix this routine a few years ago when COUNT_ALLOCS was defined
but Py_TRACE_REFS wasn't. The special-build code here is way too
complicated. Now it's much simpler. Difference: in a Py_TRACE_REFS
build, the instance is no longer in the doubly-linked list of live
objects while its __del__ method is executing, and that may be visible
via sys.getobjects() called from a __del__ method. Tough -- the object
is presumed dead while its __del__ is executing anyway, and not calling
_Py_NewReference() at the start allows enormous code simplification.
typeobject.c, call_finalizer(): The special-build instance_dealloc()
pain apparently spread to here too via cut-'n-paste, and this is much
simpler now too. In addition, I didn't understand why this routine
was calling _PyObject_GC_TRACK() after a resurrection, since there's no
plausible way _PyObject_GC_UNTRACK() could have been called on the
object by this point. I suspect it was left over from pasting the
instance_delloc() code. Instead asserted that the object is still
tracked. Caution: I suspect we don't have a test that actually
exercises the subtype_dealloc() __del__-resurrected-me code.
more trivial lexical helper macros so that uses of these guys expand
to nothing at all when they're not enabled. This should help sub-
standard compilers that can't do a good job of optimizing away the
previous "(void)0" expressions.
Py_DECREF: There's only one definition of this now. Yay! That
was that last one in the family defined multiple times in an #ifdef
maze.
Py_FatalError(): Changed the char* signature to const char*.
_Py_NegativeRefcount(): New helper function for the Py_REF_DEBUG
expansion of Py_DECREF. Calling an external function cuts down on
the volume of generated code. The previous inline expansion of abort()
didn't work as intended on Windows (the program often kept going, and
the error msg scrolled off the screen unseen). _Py_NegativeRefcount
calls Py_FatalError instead, which captures our best knowledge of
how to abort effectively across platforms.
Repair segfaults and infinite loops in COUNT_ALLOCS builds in the
presence of new-style (heap-allocated) classes/types.
Bugfix candidate. I'll backport this to 2.2. It's irrelevant in 2.1.
that have taken me "too long" to reverse-engineer over the years.
Vastly reduced the nesting level and redundancy of #ifdef-ery.
Took a light stab at repairing comments that are no longer true.
sys_gettotalrefcount(): Changed to enable under Py_REF_DEBUG.
It was enabled under Py_TRACE_REFS, which was much heavier than
necessary. sys.gettotalrefcount() is now available in a
Py_REF_DEBUG-only build.
mechanism is no longer evil: it no longer plays dangerous games with
the type pointer or refcounts, and objects in extension modules can play
along too without needing to edit the core first.
Rewrote all the comments to explain this, and (I hope) give clear
guidance to extension authors who do want to play along. Documented
all the functions. Added more asserts (it may no longer be evil, but
it's still dangerous <0.9 wink>). Rearranged the generated code to
make it clearer, and to tolerate either the presence or absence of a
semicolon after the macros. Rewrote _PyTrash_destroy_chain() to call
tp_dealloc directly; it was doing a Py_DECREF again, and that has all
sorts of obscure distorting effects in non-release builds (Py_DECREF
was already called on the object!). Removed Christian's little "embedded
change log" comments -- that's what checkin messages are for, and since
it was impossible to correlate the comments with the code that changed,
I found them merely distracting.
In the past, an object's tp_compare could return any value. In 2.2
the docs were tightened to require it to return -1, 0 or 1; and -1 for
an error.
We now issue a warning if the value is not in this range. When an
exception is raised, we allow -1 or -2 as return value, since -2 will
the recommended return value for errors in the future. (Eventually
tp_compare will also be allowed to return +2, to indicate
NotImplemented; but that can only be implemented once we know all
extensions return a value in [-2...1]. Or perhaps it will require the
type to set a flag bit.)
I haven't decided yet whether to backport this to 2.2.x. The patch
applies fine. But is it fair to start warning in 2.2.2 about code
that worked flawlessly in 2.2.1?
for 'str' and 'unicode', and can be used instead of
types.StringTypes, e.g. to test whether something is "a string":
isinstance(x, string) is True for Unicode and 8-bit strings. This
is an abstract base class and cannot be instantiated directly.
returned a proxy for __class__ whose __bases__ was also a proxy. The
merge_class_dict() helper for dir() assumed incorrectly that __bases__
would always be a tuple and used the in-line tuple API on the proxy.
I will backport this to 2.2 as well.
left and right type were of the same type and not classic instances.
This shortcut is dangerous for proxy types, because it means that
coerce(Proxy(1), Proxy(2.1)) leaves Proxy(1) unchanged rather than
turning it into Proxy(1.0).
In an ever-so-slight change of semantics, I now only take the shortcut
when the left and right types are of the same type and don't have the
CHECKTYPES feature. It so happens that classic instances have this
flag, so the shortcut is still skipped in this case (i.e. nothing
changes for classic instances). Proxies also have this flag set
(otherwise implementing numeric operations on proxies would become
nightmarish) and this means that the shortcut is also skipped there,
as desired. It so happens that int, long and float also have this
flag set; that means that e.g. coerce(1, 1) will now invoke
int_coerce(). This is fine: int_coerce() can deal with this, and I'm
not worried about the performance; int_coerce() is only invoked when
the user explicitly calls coerce(), which should be rarer than rare.
PyMem_{Del, DEL} doesn't work yet (compilation problems).
pyport.h: _PyMem_EXTRA is gone.
pmem.h: Repaired comments. PyMem_{Malloc, MALLOC} and
PyMem_{Realloc, REALLOC} now make the same x-platform guarantees when
asking for 0 bytes, and when passing a NULL pointer to the latter.
object.c: PyMem_{Malloc, Realloc} just call their macro versions
now, since the latter take care of the x-platform 0 and NULL stuff
by themselves now.
pypcre.c, grow_stack(): So sue me. On two lines, this called
PyMem_RESIZE to grow a "const" area. It's not legit to realloc a
const area, so the compiler warned given the new expansion of
PyMem_RESIZE. It would have gotten the same warning before if it
had used PyMem_Resize() instead; the older macro version, but not the
function version, silently cast away the constness. IMO that was a wrong
thing to do, and the docs say the macro versions of PyMem_xyz are
deprecated anyway. If somebody else is resizing const areas with the
macro spelling, they'll get a warning when they recompile now too.
PEP 285. Everything described in the PEP is here, and there is even
some documentation. I had to fix 12 unit tests; all but one of these
were printing Boolean outcomes that changed from 0/1 to False/True.
(The exception is test_unicode.py, which did a type(x) == type(y)
style comparison. I could've fixed that with a single line using
issubtype(x, type(y)), but instead chose to be explicit about those
places where a bool is expected.
Still to do: perhaps more documentation; change standard library
modules to return False/True from predicates.
Also move all _PyMalloc_XXX entry points into obmalloc.c.
The Windows build works fine.
The Unix build is changed here (Makefile.pre.in), but not tested.
No other platform's build process has been fiddled.
platform realloc(p, 0) returns NULL, so MALLOC_ZERO_RETURNS_NULL can
be correctly undefined yet realloc(p, 0) can return NULL anyway.
Prevent realloc(p, 0) doing free(p) and returning NULL via a different
hack. Would probably be better to get rid of MALLOC_ZERO_RETURNS_NULL
entirely.
Bugfix candidate.
Due to the bizarre definition of _PyLong_Copy(), creating an instance
of a subclass of long with a negative value could cause core dumps
later on. Unfortunately it looks like the behavior of _PyLong_Copy()
is quite intentional, so the fix is more work than feels comfortable.
This fix is almost, but not quite, the code that Naofumi Honda added;
in addition, I added a test case.
string object (or a Unicode that's trivially converted to ASCII).
PyObject_GetAttr(): add an 'else' to the Unicode test like
PyObject_SetAttr() already has.
helping for types that defined tp_richcmp but not tp_compare, although
that's when it's most valuable, and strings moved into that category
since the fast path was first introduced. Now it helps for same-type
non-Instance objects that define rich or 3-way compares.
For all the edits here, the rest just amounts to moving the fast path from
do_richcmp into PyObject_RichCompare, saving a layer of function call
(measurable on my box!). This loses when NESTING_LIMIT is exceeded, but I
don't care about that (fast-paths are for normal cases, not pathologies).
Also added a tasteful <wink> label to get out of PyObject_RichCompare, as
the if/else nesting in this routine was getting incomprehensible.
This patch implements what we have discussed on python-dev late in
September: str(obj) and unicode(obj) should behave similar, while
the old behaviour is retained for unicode(obj, encoding, errors).
The patch also adds a new feature with which objects can provide
unicode(obj) with input data: the __unicode__ method. Currently no
new tp_unicode slot is implemented; this is left as option for the
future.
Note that PyUnicode_FromEncodedObject() no longer accepts Unicode
objects as input. The API name already suggests that Unicode
objects do not belong in the list of acceptable objects and the
functionality was only needed because
PyUnicode_FromEncodedObject() was being used directly by
unicode(). The latter was changed in the discussed way:
* unicode(obj) calls PyObject_Unicode()
* unicode(obj, encoding, errors) calls PyUnicode_FromEncodedObject()
One thing left open to discussion is whether to leave the
PyUnicode_FromObject() API as a thin API extension on top of
PyUnicode_FromEncodedObject() or to turn it into a (macro) alias
for PyObject_Unicode() and deprecate it. Doing so would have some
surprising consequences though, e.g. u"abc" + 123 would turn out
as u"abc123"...
[Marc-Andre didn't have time to check this in before the deadline. I
hope this is OK, Marc-Andre! You can still make changes and commit
them on the trunk after the branch has been made, but then please mail
Barry a context diff if you want the change to be merged into the
2.2b1 release branch. GvR]
object.c, PyObject_Str: Don't try to optimize anything except exact
string objects here; in particular, let str subclasses go thru tp_str,
same as non-str objects. This allows overrides of tp_str to take
effect.
stringobject.c:
+ string_print (str's tp_print): If the argument isn't an exact string
object, get one from PyObject_Str.
+ string_str (str's tp_str): Make a genuine-string copy of the object if
it's of a proper str subclass type. str() applied to a str subclass
that doesn't override __str__ ends up here.
test_descr.py: New str_of_str_subclass() test.
This simplifies the rounding in _PyObject_VAR_SIZE, allows to restore the
pre-rounding calling sequence, and allows some nice little simplifications
in its callers. I'm still making it return a size_t, though.
As Guido suggested, this makes the new subclassing code substantially
simpler. But the mechanics of doing it w/ C macro semantics are a mess,
and _PyObject_VAR_SIZE has a new calling sequence now.
Question: The PyObject_NEW_VAR macro appears to be part of the public API.
Regardless of what it expands to, the notion that it has to round up the
memory it allocates is new, and extensions containing the old
PyObject_NEW_VAR macro expansion (which was embedded in the
PyObject_NEW_VAR expansion) won't do this rounding. But the rounding
isn't actually *needed* except for new-style instances with dict pointers
after a variable-length blob of embedded data. So my guess is that we do
not need to bump the API version for this (as the rounding isn't needed
for anything an extension can do unless it's recompiled anyway). What's
your guess?
+ Use the _PyObject_VAR_SIZE macro to compute object size.
+ Break the computation into lines convenient for debugger inspection.
+ Speed the round-up-to-pointer-size computation.
hack, and it's even more disgusting than a PyInstance_Check() call.
If the tp_compare slot is the slot used for overrides in Python,
it's always called.
Add some tests that show what should work too.
and are lists, and then just the string elements (if any)).
There are good and bad reasons for this. The good reason is to support
dir() "like before" on objects of extension types that haven't migrated
to the class introspection API yet. The bad reason is that Python's own
method objects are such a type, and this is the quickest way to get their
im_self etc attrs to "show up" via dir(). It looks much messier to move
them to the new scheme, as their current getattr implementation presents
a view of their attrs that's a untion of their own attrs plus their
im_func's attrs. In particular, methodobject.__dict__ actually returns
methodobject.im_func.__dict__, and if that's important to preserve it
doesn't seem to fit the class introspection model at all.
- use PyModule_Check() instead of PyObject_TypeCheck(), now we can.
- don't assert that the __dict__ gotten out of a module is always
a dictionary; check its type, and raise an exception if it's not.
PEP 238. Changes:
- add a new flag variable Py_DivisionWarningFlag, declared in
pydebug.h, defined in object.c, set in main.c, and used in
{int,long,float,complex}object.c. When this flag is set, the
classic division operator issues a DeprecationWarning message.
- add a new API PyRun_SimpleStringFlags() to match
PyRun_SimpleString(). The main() function calls this so that
commands run with -c can also benefit from -Dnew.
- While I was at it, I changed the usage message in main() somewhat:
alphabetized the options, split it in *four* parts to fit in under
512 bytes (not that I still believe this is necessary -- doc strings
elsewhere are much longer), and perhaps most visibly, don't display
the full list of options on each command line error. Instead, the
full list is only displayed when -h is used, and otherwise a brief
reminder of -h is displayed. When -h is used, write to stdout so
that you can do `python -h | more'.
Notes:
- I don't want to use the -W option to control whether the classic
division warning is issued or not, because the machinery to decide
whether to display the warning or not is very expensive (it involves
calling into the warnings.py module). You can use -Werror to turn
the warnings into exceptions though.
- The -Dnew option doesn't select future division for all of the
program -- only for the __main__ module. I don't know if I'll ever
change this -- it would require changes to the .pyc file magic
number to do it right, and a more global notion of compiler flags.
- You can usefully combine -Dwarn and -Dnew: this gives the __main__
module new division, and warns about classic division everywhere
else.
- Do not compile unicodeobject, unicodectype, and unicodedata if Unicode is disabled
- check for Py_USING_UNICODE in all places that use Unicode functions
- disables unicode literals, and the builtin functions
- add the types.StringTypes list
- remove Unicode literals from most tests.
types -- currently Type, List, None and NotImplemented. To be called
from Py_Initialize() instead of accumulating calls there.
Also rename type(None) to NoneType and type(NotImplemented) to
NotImplementedType -- naming the type identical to the object was
confusing.
returns that. (This fix is also by MvL; checkin it in because I want
to make more changes here. I'm still not 100% satisfied -- see
comments attached to the patch.)
- Add an explicit call to PyType_Ready(&PyList_Type) to pythonrun.c
(just for the heck of it, really -- we should either explicitly
ready all types, or none).
And remove all the extern decls in the middle of .c files.
Apparently, it was excluded from the header file because it is
intended for internal use by the interpreter. It's still intended for
internal use and documented as such in the header file.
case of objects with equal types which support tp_compare. Give
type objects a tp_compare function.
Also add c<0 tests before a few PyErr_Occurred tests.
NEEDS DOC CHANGES
A few more AttributeErrors turned into TypeErrors, but in test_contains
this time.
The full story for instance objects is pretty much unexplainable, because
instance_contains() tries its own flavor of iteration-based containment
testing first, and PySequence_Contains doesn't get a chance at it unless
instance_contains() blows up. A consequence is that
some_complex_number in some_instance
dies with a TypeError unless some_instance.__class__ defines __iter__ but
does not define __getitem__.
the code necessary to accomplish this is simpler and faster if confined to
the object implementations, so we only do this there.
This causes no behaviorial changes beyond a (very slight) speedup.
object's type didn't define tp_print, there were still cases where the
full "print uses str() which falls back to repr()" semantics weren't
honored. This resulted in
>>> print None
<None object at 0x80bd674>
>>> print type(u'')
<type object at 0x80c0a80>
Fixed this by always using the appropriate PyObject_Repr() or
PyObject_Str() call, rather than trying to emulate what they would do.
Also simplified PyObject_Str() to always fall back on PyObject_Repr()
when tp_str is not defined (rather than making an extra check for
instances with a __str__ method). And got rid of the special case for
strings.
Fix a very old flaw in PyObject_Print(). Amazing! When an object
type defines tp_str but not tp_repr, 'print x' to a real file
object would not call the tp_str slot but rather print a default style
representation: <foo object at 0x....>. This even though 'print x' to
a file-like-object would correctly call the tp_str slot.
and the test for errors, so that an error in the default compare
doesn't go undetected. This fixes SF Bug #132933 (submitted by
effbot) -- list.sort doesn't detect comparision errors.
PyObject_Dump(): New function that is useful when debugging Python's C
runtime. In something like gdb it can be a pain to get some useful
information out of PyObject*'s. This function prints the str() of the
object to stderr, along with the object's refcount and hex address.
PyGC_Dump(): Similar to PyObject_Dump() but knows how to cast from the
garbage collector prefix back to the PyObject* structure.
[See Misc/gdbinit for some useful gdb hooks]
none_dealloc(): Rather than SEGV if we accidentally decref None out of
existance, we assign None's and NotImplemented's destructor slot to
this function, which just calls abort().