(high_bits << PyLong_SHIFT) + low_bits with
(high_bits << PyLong_SHIFT) | low_bits
in Objects/longobject.c. Motivation:
- shouldn't unnecessarily mix bit ops with arithmetic ops (style)
- this pattern should be spelt the same way thoughout (consistency)
- it's very very very slightly faster: no need to worry about
carries to the high digit (nano-optimization).
The debug memory api now keeps track of which external API (PyMem_* or PyObject_*) was used to allocate each block and treats any API violation as an error. Added separate _PyMem_DebugMalloc functions for the Py_Mem API instead of having it use the _PyObject_DebugMalloc functions.
Without this change, test_unicode.UnicodeTest.test_codecs_utf7 crashes in
debug mode. What happens is the unicode string u'\U000abcde' with a length
of 1 encodes to the string '+2m/c3g-' of length 8. Since only 5 bytes is
reserved in the buffer, a buffer overrun occurs.
lnotab-based tracing is very complicated and isn't documented very well. There
were at least 3 comment blocks purporting to document co_lnotab, and none did a
very good job. This patch unifies them into Objects/lnotab_notes.txt which
tries to completely capture the current state of affairs.
I also discovered that we've attached 2 layers of patches to the basic tracing
scheme. The first layer avoids jumping to instructions that don't start a line,
to avoid problems in if statements and while loops. The second layer
discovered that jumps backward do need to trace at instructions that don't
start a line, so it added extra lnotab entries for 'while' and 'for' loops, and
added a special case for backward jumps within the same line. I replaced these
patches by just treating forward and backward jumps differently.
made it try to set the line number from the trace callback for a 'call' event.
This patch makes the error message a little more helpful in that case, and
makes it a little less likely that a future editor will make the same mistake
in test_trace.
Most uses of PyCode_Addr2Line
(http://www.google.com/codesearch?q=PyCode_Addr2Line) are just trying to get
the line number of a specified frame, but there's no way to do that directly.
Forcing people to go through the code object makes them know more about the
guts of the interpreter than they should need.
The remaining uses of PyCode_Addr2Line seem to be getting the line from a
traceback (for example,
http://www.google.com/codesearch/p?hl=en#u_9_nDrchrw/pygame-1.7.1release/src/base.c&q=PyCode_Addr2Line),
which is replaced by the tb_lineno field. So we may be able to deprecate
PyCode_Addr2Line entirely for external use.
Most uses of PyCode_New found by http://www.google.com/codesearch?q=PyCode_New
are trying to build an empty code object, usually to put it in a dummy frame
object. This patch adds a PyCode_NewEmpty wrapper which lets the user specify
just the filename, function name, and first line number, instead of also
requiring lots of code internals.
the __doc__ into the subclass instance __dict__. The fix refactors
property_copy to call property_init in such a way that the __doc__
logic is re-executed correctly when getter_doc is 1, thus simplifying
property_copy.
PyUnicode_DecodeUTF8() once, remember the result and output it in a second
step. This avoids problems with counting UTF-8 bytes that ignores the effect
of using the replace error handler in PyUnicode_DecodeUTF8().
If anyone wants to clean up the documentation, feel free. It's my first documentation foray, and it's not that great.
Will port to py3k with a different strategy.
- simplify parsing and printing of complex numbers
- make complex(repr(z)) round-tripping work for complex
numbers involving nans, infs, or negative zeros
- don't accept some of the stranger complex strings
that were previously allowed---e.g., complex('1..1j')
int, long, and float __format__(), and it keeps their implementation
in sync with py3k.
Also added PyOS_double_to_string. This is the "fallback" version
that's also available in trunk, and should be kept in sync with that
code. I'll add an issue to document PyOS_double_to_string in the C
API.
There are many internal cleanups. Externally visible changes include:
- Implement PEP 378, Format Specifier for Thousands Separator, for
floats, ints, and longs.
- Issue #5515: 'n' formatting for ints, longs, and floats handles
leading zero formatting poorly.
- Issue #5772: For float.__format__, don't add a trailing ".0" if
we're using no type code and we have an exponent.
correctly rounded, using round-half-to-even. This ensures that the
value of float(n) doesn't depend on whether we're using 15-bit digits
or 30-bit digits for Python longs.
untrackable objects are not tracked by the garbage collector. This can
reduce the size of collections and therefore the garbage collection overhead
on long-running programs, depending on their particular use of datatypes.
(trivia: this makes the "binary_trees" benchmark from the Computer Language
Shootout 40% faster)
The basic algorithm remains the same; the most significant speedups
come from the following three changes:
(1) normalize by shifting instead of multiplying and dividing
(2) the old algorithm usually did an unnecessary extra iteration of
the outer loop; remove this. As a special case, this means that
long divisions with a single-digit result run twice as fast as
before.
(3) make inner loop much tighter.
Various benchmarks show speedups of between 50% and 150% for long
integer divisions and modulo operations.
sizeof(Py_UNICODE) == 2, PyUnicode_FromWideChar now converts
each character outside the BMP to the appropriate surrogate pair.
Thanks Victor Stinner for the patch.
(backport of r70452 from py3k to trunk)
For simple uses for str.format(), this makes the typing easier. Hopfully this
will help in the adoption of str.format().
For example:
'The {} is {}'.format('sky', 'blue')
You can mix and matcth auto-numbering and named replacement fields:
'The {} is {color}'.format('sky', color='blue')
But you can't mix and match auto-numbering and specified numbering:
'The {0} is {}'.format('sky', 'blue')
ValueError: cannot switch from manual field specification to automatic field numbering
Will port to 3.1.
and cleanups in Objects/longobject.c. The most significant change is that
longs now use less memory: average savings are 2 bytes per long on 32-bit
systems and 6 bytes per long on 64-bit systems. (This memory saving already
exists in py3k.)
platforms. The previous code was fragile, depending on the twin
accidents that:
(1) in C, casting the double value 2.**63 to long returns the integer
value -2**63, and
(2) in Python, hash(-2**63) == hash(2**63).
There's already a test for this in test_hash.